!EE  COPY  -  AD-A162  630 


REPORT  DOCUMENTATION  PAGE 

RF AD  INSTRUCTIONS 

BEFORE  COMPLETING  FORM 

T.  '  REPORT1' NUMBiR  J.  GOVT  ACCESSION  NO. 

,  ,  •  Am  U3i3o 

3.  recipient's  Catalog  number 

4.  TITLE  (end  Soetltle) 

SEQUENTIAL  DECODING  FOR  MULTIPLE  ACCESS  CHANNELS 

Thesis  //' pj 

8.  PERFORMING  ORG.  REPORT  NUMBER 

LIDS-TH-1517 

7.  AUTHOR^ 

Erdal  Arikan 

8.  contract  or  grant  numbers 

UARPA  Order  -N©-^304SJ2-2-84 
Amendment  #11 
ONR/N00014-84-K-03S7  j 

9.  PERFORMING  ORGANIZATION  NAME  ANO  AOORES3 

Massachusetts  Institute  of  Technology 

Laboratory  for  Information  and  Decision  Systems 
Cambridge,  Massachusetts  02139 

10.  PROGRAM  ELEMENT.  PROJECT.  TASK 
AREA  8  PORK  UNIT  NUMBERS 

Program  Code  No.  5T10 

0NR  Identifying  No. 049-383 

1  1.  CONTROLLING  OFFICE  NAME  ANO  AOORESS 

Defense  Advanced  Research  Projects  Agency 

1400  Wilson  Boulevard 

Arlington,  Virginia  22209 

IS.  REPORT  OATS 

December  i.985 

is.  number  of  pAoes 

112 

U.  MONITORING" AGENCY  NAME  8  AOORESVM  tUUoront  trooe  Contnlllnt  OlHoo) 

Offi  rn  nf  Naval  RMearc.h  _  „ - 

IS.  SECURITY  CLASS,  (oi  Oita  r  apart ) 

Information  Systems  Program 

Code  437 

Arlington,  "irginia  22217 

UNCLASSIFIED 

15a.  DECLASSIFICATION/  OOPNGRAOlNG 
SCHEDULE 

18.  DISTRIBUTION  STATEMENT  (ol  title  Report) 

Approved  for  public  release;  distribution  unlimited 

17.  DISTRIBUTION  STATEMENT  (a  I  tho  ebetrect  xilmd  In  Block  30,  II  dlllorenl  Into  Report) 


18.  SUPPLEMENTARY  notes 


19.  KEY  POR0S  (Con.  to  on  nroteo  tide  If  nocoooorr  end  Identity  by  block  men  bee) 


t  l 


t,  OEC  2  6  1985 

: 


20.  ABSTRACT  (Conttnum  on  rerer**  sido  H  nacaaamry  and  Identity  by  block  numbae) 

Sequential  decoding  is  a  decoding  algorithm  for  tree  codes  originally 
developed  for  single-user  channels  (i.e.,  channels  with  one  transmitter  and  one 
receiver).  Sequential  decoding  relies  on  what  is  called  a  metric  to  direct  its 
search  and  find  the  path  in  the  tree  that  corresponds  to  the  encoded  message. 

The  decoding  complexity  in  sequential  decoding,  that  is,  the  number  of  computation^ 
to  decode  a  source  digit,  is  a  random  variable.  A  rate  is  said  to  be  achievable 
by  sequential  decoding  if  it  is  possible  to  select  a  code  with  that  rate  and  a 
metric  such  that  the  expected  value  of  the  decoding  complexity  is  fir,  te.  In 


Hi* 


DD  , 


FORM 
JAN  73 


1473 


COITION  OF  I  NOV  SS  IS  OBSOLETE 


SECURITY  CLASSIFICATION  OF  THIS  PAGE  (When  Detm  Entered) 


.*  _•  .**  -*■  _*  _•  .*  *  _»  ~ 


’  -  *  .  * 


20 .  (Continued) 

\\ 

the  single-user  case,  the  largest  achievable  rate  is  called  the  cut-off  rate 

of  sequential  decoding.  .  ~  “ 

Multiple  access  channels  are  models  of  communication  systems  where 
there  are  a  number  of  users  all  sharing  the  same  transmission  medium  to 
communicate  their  messages  to  a  common  receiver.  This  thesis  explores 
the  possibility  of  using  sequential  decoding  on  multiple  access  channels.. 
Immediate  generalizations  of  the  metrics,  in  particular  of  the  Fanov 
metric,  that  have  been  used  in  the  past  for  single-user  sequential 
decoding,  do  not  work  satisfactorily  in  the  multi-user  case.  A  new 
metric  is  introduced  which  works  quite  satisfactorily  not  only  for 
multiple  access  channels  but  also  for  single-user  ones.  The  achievable 
rate  region  of  sequential  decoding  under  this  new  metric  is  evaluated.  It 

is  shown  by  examples  that  sequential  dccjdiny  nas  'iTTg^gtsyn^iaToT 
achieving  rates  (throughputs)  beyond  those  achievable  by  conventional 
ways  of  using  multiple  access  channels,  such  as  time-division 
multiplexing,  frequency  division  multiplexing,  and  Aloha-like  schemes. 

Outer  bounds  to  the  achievable  'ate  region  of  sequential  decoding  are 
considered.  The  cut-off  rate  of  sequential  decoding  (in  the  sing'e-user 
case)  is  determined,  thus  settling  a  long-standing  open  question.  Also, 
the  achievable  rate  region  of  sequential  decoding  is  determined  in  the 
case  of  multiple  access  channels  that  have  a  property  known  as 
pairwise-reversibility.  The  achievable  rate  region  of  sequential  decoding 
for  arbitrary  multiple  access  channels  remains  undetermined. 


An  alternative  approach  to  sequential  decoding,  in  which  there  is  a 
separate  sequential  decoder  for  each  user  in  the  system,  is  considered 
and  an  inner  bound  to  its  achievable  rate  region  is  given.  Non-joint 
sequential  decoding,  as  this  approach  is  called,  has  the  advantage  of 
be’Tig  simple:  each  sequential  decoder  is  responsible  for  decoding  the 
message  of  a  single  user,  so  it  cces  not  have  to  know  the  tree  codes  of 
the  other  users.  An  example  is  given  for  which  non-joint  sequential 
decoding,  in  addition  to  being  simpler,  also  achieves  rates  that  are 
unachievable  by  ordinary  sequential  decoding. 
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ABSTRACT 

Sequential  decoding  Is  a  decoding  algorithm  for  tree  codes  originally 
developed  for  single-user  channels  (i.e.,  channels  with  one  transmitter 
and  one  receiver).  Sequential  decoding  relies  on  what  ts  called  a  metric 
to  direct  Its  search  and  find  the  path  in  the  tree  that  corresponds  to  the 
encoded  message.  The  decoding  complexity  in  sequential  decoding,  that 
Is,  the  number  of  computations  to  decode  a  source  digit,  is  a  random 
variable.  A  rate  is  said  to  be  achievable  by  sequential  decoding  if  it  is 
possible  to  select  a  code  with  that  rate  and  a  metric  such  that  the 
expected  value  of  the  decoding  complexity  is  finite.  In  the  single-user 
case,  the  largest  achievable  rate  is  called  the  cut-off  rate  of  sequential 
decoding. 

Multiple  access  channels  are  models  of  communication  systems  where 
there  are  a  number  of  users  all  sharing  the  same  transmission  medium  to 
communicate  their  messages  to  a  common  receiver.  This  thesis  explores 
the  possibility  of  using  sequential  decoding  on  multiple  access  channels. 
Immediate  generalizations  of  the  metrics,  in  particular  of  the  Fano 
metric,  that  have  been  used  in  the  past  for  single-user  sequential 
decoding,  do  not  work  satisfactorily  in  the  multi-user  case.  A  new 
metric  is  introduced  which  works  quite  satisfactorily  not  only  for 
multiple  access  channels  but  also  for  single-user  ones.  The  achievable 
rate  region  of  sequential  decoding  under  this  new  metric  is  evaluated.  It 
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Is  shown  by  examples  that  sequential  decoding  has  the  potential  of 
achieving  rates  (throughputs)  beyond  those  achievable  by  conventional 
ways  of  using  multiple  access  channels,  such  as  time-division 
multiplexing,  frequency  division  multiplexing,  and  Aloha-like  schemes. 

Outer  bounds  to  the  achievable  rate  region  of  sequential  decoding  are 
considered.  The  cut-off  rate  of  sequential  decoding  (in  the  single-user 
case)  is  determined,  thus  settling  a  long-standing  open  question.  Also, 
the  achievable  rate  region  of  sequential  decoding  is  determined  in  the 
case  of  multiple  access  channels  that  have  a  property  known  as 
pairwise-reversibility.  The  achievable  rate  region  of  sequential  decoding 
for  arbitrary  multiple  access  channels  remains  undetermined. 

An  alternative  approach  to  sequential  decoding,  in  which  there  is  a 
separate  sequential  decoder  for  each  user  in  the  system,  is  considered 
and  an  inner  bound  to  its  achievable  rate  region  is  given.  Non-joint 
sequential  decoding,  as  this  approach  is  called,  has  the  advantage  of 
being  simple:  each  sequential  decoder  is  responsible  for  decoding  the 
message  of  a  single  user,  so  it  does  not  have  to  know  the  tree  codes  of 
the  other  users.  An  example  is  given  for  which  non-joint  sequential 
decoding,  in  addition  to  being  simpler,  also  achieves  rates  that  are 
unachievable  by  ordinary  sequential  decoding. 
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Chapter  I 
INTRODUCTION 


Multiple  access  channels  are  models  of  communication  systems  in  which 
there  are  a  number  of  uncoordinated  users  sharing  a  transmission 
medium  to  transmit  messages  to  a  common  destination.  Some  examples 
of  multiple  access  channels  are  a  satellite  transponder  shared  by  several 
ground  stations,  a  radio  network  in  which  users  transmit  over  the  same 
frequency  band  to  exchange  messages,  and  a  computer  network  where 
several  computers  send  messages  over  a  common  bus. 

One  common  approach  to  multiple  access  communications  is  to  employ 
time-sharing  (time-division  multiplexing),  in  which  at  any  given  time 
only  one  user  is  allowed  to  transmit  a  message.  This  idea  of  splitting  a 
given  channel  into  non-interfering  subchannels  and  giving  the  use  of  each 
subchannel  exclusively  to  a  single  user  also  underlies  frequency-division 
multiplexing  and  other  techniques  that  aim  at  elimination  of  multi-user 
interference. 

Another  approach,  which  is  much  less  common  than  time-sharing,  is  to 
let  all  users  transmit  simultaneously,  thus  allowing  them  to  interfere 
with  each  other.  In  this  approach,  a  sufficient  amount  of  redundancy  is 
embedded  into  what  is  transmitted  by  eacn  user  so  that,  with  high 
probability,  the  receiver  can  reconstruct  the  messages  correctly.  This  is 
the  coding  approach  to  multiple  access  communications.  Theoretically, 
coding  affords  a  channel  utilization  (throughput)  always  as  high  as,  and 
often  significantly  higher  than,  what  is  possible  bu  time-sharing.  The 
reason  for  being  interested  in  coding  for  multiple  access  channels  is 
thus  the  desire  to  communicate  at  higher  rates,  or  more  reliaoly  at  a 
given  rate. 


While  coding  is  potentially  superior  to  time-sharing  in  terms  of 
throughput,  it  requires  more  complexity  in  the  form  of  encoders  and 
decoders.  In  addition,  there  is  the  problem  of  finding  an  encoder-decoder 


pair  achieving  a  given  desired  rate.  Tnis  thesis  examines  a  particular 
approach  to  coding  for  multi-access  channels,  namely,  tree  coding  and 
sequential  decoding,  and  establishes  it  as  a  practically  applicable 
method  for  achieving  rates  beyond  those  achievable  by  time-sharing. 


1.1.  The  Multiple  Access  Channel  Model 

The  multiple  access  channel  model  used  in  this  thesis  has,  as  its  central 
element,  a  channel  (In  the  information  theoretic  sense  of  the  word), 
which  has  one  input  for  each  user  and  a  single  output  to  the  common 
destination  (Figure  1.1.1). 


source  1 


encoder  1 


ChannelrM  Decoder 


source  n 


encoder  n 


Figure  1.1.1.  Multi-user  communication  system  mcoel. 


Our  study  i3  restricted  to  the  class  of  channels  which  have  the  following 
properties. 


1)  The  channel  operates  in  discrete  time;  it  can  be  used  only  once  a 
second,  say. 


2)  The  channel  is  discrete;  that  is,  the  channel  input  and  output 
alphabets  are  finite  sets. 


3)  The  channel  is  memoryless  and  stationary.  Memory lessness  is  the 
property  that  the  statistics  of  the  output  at  any  given  time  depends  only 
on  the  inputs  at  that  time,  and  possibly  on  the  time  itself.  Stationarity 
rules  out  the  dependence  of  channel  statistics  on  time. 

A  channel  in  this  class  with  n  users  can  be  identified  by  its  input 
alphabets  Xt,...,Xn,  its  output  alphabet  Y,  and  its  transition  probabilities 

P={P(i\[0:n«Y,  $eXtx— P(t\|$)  is  the  probability  of  receiving  t\ 

given  that  $  is  transmitted.  If  $=($1f... an  alternative  notation  for 

P(H|S)  is  P(i\  |  *n);  P(t\  |  is  thus  the  probability  that  t\  is 

received  given  that  user  i  transmits  £it  i=1,...,n.  A  channel  with  these 

parameters  will  be  denoted  by  (P;Xi,...,Xn;Y). 

The  encoders  in  this  model  are  what  we  call  (M,k)  encoders,  where  M  and 
k  are  arbitrary  positive  integers.  An  (M,k)  encoder  is  a  device  which 
sends  k  symbols  to  the  channel  for  each  digit  it  receives  from  the 
source;  M  designates  the  size  of  the  source  alphabet. 

In  general,  each  user  may  have  encoders  with  arbitrary  parameters,  say, 
(Mf,kf)  for  user  i,  i=1,...,n.  We  shall,  however,  consider  only  these  cases 

where  ki  is  the  same  for  all  i,  and  denote  the  parameter  of  such  a 

collection  of  encoders  by  (Mj,...,  Mn,k). 

A  source  for  an  (M,*)  encoder  is  viewed  as  an  infinite  shift-register 
holding  digits  from  a  set  with  M  elements.  It  is  assumed  that  each  digit 
in  each  source  register  is  a  random  variable,  uniformly  distributed,  and 
independent  of  all  other  source  digits  in  the  same  or  in  other  registers. 
Viewing  the  sources  in  this  way  eliminates  the  source  coding  problem, 
and  thus,  enables  us  to  focus  on  the  problem  of  channel  coding,  which  is 
the  problem  of  main  interest  here. 

At  this  point,  we  view  the  decoder  quite  generally  as  any  device  that 
generates  an  estimate  for  each  source  digit. 


No  tics  that,  as  a  result  of  the  statistical  independencs  at  the  source 
level  and  the  tack  of  cooperation  among  the  users  in  the  encoding  of 
their  messages,  the  inputs  to  the  channel  by  different  users  are 
statistically  independent.  This  is  the  essential  difference  between  a 
multi-user  channel,  say,  (P;X1f....,Xn;Y)  and  its  single-user  counterpart 

<PjX1x...xXn;Y). 

The  two  main  performance  criteria  for  the  analysis  of  this  model  will  be 
the  expected  system  delay  and  the  probability  of  decoding  error.  System 
delay  for  a  source  digit  is  defined  as  the  time  lag  from  the  time  that 
digit  is  accepted  by  its  encoder  to  the  time  the  decoder  delivers  its 
estimate  about  that  digit.  System  delay  is  permitted  to  be  a  random 
variable;  but  clearly,  a  system  can  not  be  used  in  practice  unless  the 
expected  system  delay  is  uniformly  bounded  over  all  source  digits. 

Probability  of  decoding  error  for  a  source  digit  is  the  probability  that 
the  decoder  estimate  for  that  digit  is  in  error.  We  are  interested  in 
finding  ways  of  reducing  the  probability  of  decoding  error  to  arbitrarily 
low  levels  for  each  source  digit,  while  keeping  the  expected  system 
delay  bounded. 

In  order  to  describe  the  model  precisely,  and  also  for  future  reference, 
we  now  list  the  notation  that  will  be  used  throughout  this  thesis. 


Transmissions  start  at  time  1,  and  take  place  at  times  1,2,3,... 

As  a  convention,  in  the  following  notation,  subscripts  refer  to  user 
identity,  arguments  refer  to  time. 

Generically,  e,  stands  for  the  encoder  (the  device)  and  the  encoding 

operation  for  user  1;  the  parameter  of  ej  is  denoted  by  (nitk);  and  the 

number  of  users  is  denoted  by  n.  e  denotes  the  collection  of  encoders 
e,,...,en,  and  also  the  joint  encoding  operation. 


Sj(m)  Is  ths  mth  input  to  8j,  or  equivalently,  the  mth  output  of  source  i. 
s1(«m)s(si(1),....,si(m))  is  the  first  m  inputs  to  8j. 
s1=s1(l),s1(2),...  is  the  input  sequence  to  e^ 

It  is  important  to  note  that  Sj  denotes  the  actual  output  of  source  i. 

Throughout  what  follows,  the  letter  s  is  reserved  for  denoting  actual 
source  outputs.  When  there  is  need  to  mention  a  possible  but  arbitrary 
output  sequence  for  source  i,  we  write  u^  or  u^  or  v^,  but  never  s^.  Thus, 

Uj  denotes  an  arbitrary  sequence  of  letters  from  We  denote  the 

mth  letter  of  Uj  by  u^m),  and  the  first  m  letters  of  u{  by  Uj(..m). 


Xf(m)  is  the  output  block  of  e^,  m=1,2,... 
x^m.j)  is  the  jtft  digit  of  x^m),  j*l,...,k. 
x1(..m)=(x1(l),....,xi(m))  is  the  first  m  output  blocks  of  e1a 
x1sx1(l,l),...,x1(l,k),xi(2,l),...  is  the  output  sequence  of  81a 

Xj  is  the  actual  output  of  encoder  e^;  in  other  words,  it  is  the  sequence 

of  channel  symbols  transmitted  by  user  i.  x^  and  Sj  are  related  through 

the  equation  x1(m)*81(S4(..m)).  As  stated  earlier,  8)  is  regarded  not  only 

as  a  device  (the  encoder)  but  also  as  the  encoding  operation  itself.  In 
this  second  sense,  81  is  a  causal  operator  mapping  source  sequences  into 

channel  input  sequences. 

Our  model  allows  x^(m)  to  depend  on  all  of  Sj(l Sj(m),  no  matter  how 
large  m  is.  If  Xj(m)  does  not  depend  on  s^m-b-!)  for  any  b>b0  and  b0  is 

the  smallest  integer  with  this  property,  then  b0  is  said  to  be  the  memoru 
of 


Encoders  with  zero  memory  ars  called  block  encoders,  and  they  will  be 
discussed  in  the  next  section.  The  discussion  of  block  codes  aims  at 
introducing  certain  theorems  that  are  useful  in  understanding  the  coding 
problem  in  multi-access  channels.  Our  focus  in  this  thesis  is  on  tree 
codes,  which  are  generated  by  encoders  that  may  have  arbitrarily  large 
memories. 

We  often  use  the  following  notation  for  the  actual  channel  Inputs: 

•1S|=xi ,  •ftfCnOsXfCm) ,  eis1(..m)=x1(..m) . 

We  use  the  following  notation  in  relation  to  what  would  be  observed  as 
the  output  of  if  u^  were  the  input  to  e^. 

e1U|(m)=ei(ui(..m)),  the  mth  output  block  of  in  response  to  ut. 
e1u1(..m)=(eju1(l),...,eiu1(m)),  the  first  m  blocks  in  response  to  u1a 
0|U1se1u1(l),81u1(2) . the  output  sequence  in  response  to  Uj. 
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s(m)=(s1(m),...tsn(m))  is  the  mth  input  to  e. 

s(..m)=(s(  1 ),..., s(m))  is  the  first  m  Inputs  to  e. 
s=s(  1  ),s(2),...  is  the  input  sequence  to  e. 

x(m,j)=(x1(m,j),...,xn(m,j))  is  the  jth  digit  in  the  mth  output  block  of  e. 

x(m)=(x(m,i ),...., x(m,k))  is  the  mth  output  block  of  e. 
x(..m)=(x(l),...,x(m))  is  the  first  m  output  blocks  of  e. 
x=x(l,1),...,x(1,k),x<2,1),...  is  the  output  sequence  of  e. 

The  functional  relationship  between  the  joint  source  output  s  and  the 
foint  channel  input  x  will  be  expressed  by  writing  x(m)=e(s(..m)).  Thus,  e 
is  regarded  both  as  a  label  for  the  collection  of  all  encoders  and  as  an 
operator  mapping  sequences  of  letters  from  {l,..,M1}x-”X{i,...,Mn}  into 
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sequences  of  letters  from  X**"**ir  In  this  secona  sense,  e  is  an 
encoder  with  parameter  (Mt»'Mn,k),  Input  alphabet  { 1, .. fM, }*•••*{  1,...,^,}, 
and  output  alphabet  Xtx--*xXn. 

We  often  use  the  following  notation  for  joint  channel  inputs. 

essx ,  es(m)=x(m) ,  es(..m)=x(..m) . 

As  in  the  case  of  individual  source  sequences,  the  tetter  s  is  reserved 
for  denoting  the  actual  joint  source  outputs.  Arbitrary  joint  source 
sequences  are  denoted  by  u  or  u  or  v,  etc.  Thus,  u  denotes  a  sequence  of 
elements  from  u(m)  denotes  the  mth  letter  of  u; 

and  u(..m)  denotes  the  first  m  letters  of  u. 

We  use  the  following  notation  in  relation  to  what  would  be  observed  as 
the  output  of  e  if  u  were  the  input  to  e. 

eu(m)=e(u(..m)),  the  nrft  output  block  of  e  in  response  to  u. 
eu(..m)=(eu(l),...,eu(m)),  the  first  m  blocks  in  response  to  u. 
eu=euO),eu(2) . .  the  output  sequence  in  response  to  u. 
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y(m,j)  is  the  channel  output  in  response  to  x(m,j). 
y(m)=(y(m,  1 ),.... ,y(m,k))  is  the  mtf)  channel  output  block. 
y(..m)s(y(  1 ),..., y(m))  is  the  first  m  channel  output  blocks. 
y=y( l,  1 ),..., y(!,k),y(2, 1 ),...  is  the  channel  output  sequence. 

zt(m)  is  the  decoder  estimate  for  s^m). 
z(m)s(zt(m),...,zn(m))  is  the  decoder  estimate  for  s(m). 

An  error  in  the  decoding  of  Sj(m)  is  the  event  that  Zj(m)*Sj(m). 
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This  completes  the  basic  list  of  notation. 
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We  new  introduce  an  operation  to  simplify  the  notation. 

For  any  collection  of  sets  AlM..,Ahl  any  integer  tf  and  any  collection  of 
*is<$t,1»~’Si,t)€Att» t=1 . n» we  d8fine 

K  1x^2x**,x^n  s  ^1 ,1  ’^2,1 

If  KfZijAiptip""  with  ^j€Ai> then  we  d8fin8 

$  1x$2x***x$ns  ^1  ,\  *^2,1  . 

Some  of  the  preceding  relations  can  now  be  restated  as  follows. 
sfmJsStCmJx^xs^m) ,  s(..m)=s,(..m)x.**xsn(..m) ,  s=Si*—*sn . 

x(m)sx1((n)x— xxn(m) ,  x(..m)=x1(..m)x...xxn(..m) ,  x=x,x...xxn . 

es=8,s1x—xansn> 
es(m)=e  t  s  t  (m)*  •  •  •  *  ensn(m), 

8s(..m)=8 ,  s ,  (..m)x ...  x  ensn(..m). 

If  is  an  arbitrary  input  sequence  for  e<j,  i*l,...,n,  and  u=utx,-**unf  then 

8U=8tUtx...xenUn, 

8u(m)=8 1  u  i  (m)x  »•  x  enun(m), 
eu(..m)=e  t  u  t  (..m)x  •  •  •  x  enun(..m).  - 
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1.2.  Capacity  and  Coding  for  Multiple  Access  Channels 

Interest  in  multiple  access  channels  (and  other  types  of  multi-user 
channels)  goes  back  to  Shannon's  1961  paper  [  1 1.  Since  the  publication  of 
that  paper  considerable  theoretical  work  has  been  done  about  such 
channels.  This  section  presents  two  well-known  results  about  multiple 
access  channels  which  provide  the  motivation  and  the  framework  for  the 
work  reported  in  this  thesis.  To  keep  the  notation  simple,  the  discussion 
is  limited  to  the  two-user  case. 

Two-User  Block  Coding 

A  (M„M2,k)  block  code  for  a  two-user  channel  with  input  alphabets  X, 
and  X2  is  a  mapping 

f:{l,...,M1Ml,...,M2} - *  (X,xx2)k 

which  has  the  property  that,  for  each  (i,j)€{l,...,M1}x{i,...,ri,}l 

f(i,j)=f,(i)*f2(j), 

for  some  pair  of  functions  f,  and  f2  such  that 


f  1*  { 1  ,...1 

,r,}- 

f2:{1  »•••! 

,m21  — 

x2k. 

The  operation  x  |$  as  defined  in  §1.1. 

The  above  definition  forces  a  two-user  block  code  f  to  be  decomposable 
into  two  component  block  codes  ft  and  f2.  This  reflects  the  requirement 
that  in  a  two-user  channel  the  channel  inputs  must  be  independently 
encoded. 


The  implementation  of  a  block  code  f,  with  component  codes  f,  and  f2,  on 
a  channel  K=(P;X1,X2;Y)  results  in  the  following  functional  relationships. 


vvvyyyyy  ***&*&«&  v<ag,VAa». 
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i4t(fn)=M(3t(m)), 


X2(m)  =f  2(S2(m)), 


x(m)=f(s(m;). 


Any  function  g,  g:Yk - {1(m,M|}x{iVMtn2}(  can  be  used  as  a  decoder 

for  the  above  block  code  by  simply  letting  2(m)=g(y(m)). 


An  error  is  said  to  occur  in  the  decoding  of  s(m)  if  s(m)*z(m).  Under  our 
assumption  that  the  source  output  letters  are  independent  and  uniformly 
distributed,  the  probability  of  s(m)*z(m)  is  independent  of  m;  it  equals 


Mi  «2 

pe(f,g)  =  2  2  o/m2)2 

i=1  j=l  T\fiYk:g(T\)s(i,j) 

P8(f,g)  is  minimized  if  g  has  the  property  that,  for  each  -i\€Yk,  g(nr\)=(i,j) 

only  if  P(^|f(i,j)hP(^|f(h,m))  for  all  (h,m)e{1,...,M1}x{l,...,M2}.  Such  a 
decoder  is  called  a  maximum-likelihood  (ML)  decoder.  The  way  ties  are 
broken  in  ML  decoding  does  not  affect  the  probability  of  decoding  error; 
so,  we  denote  the  probability  of  error  for  ML  decoders  by  P0(f). 


Caoacitu  Region 


The  capacity  region  C(K)  of  a  two-user  channel  K*(P;X1fX2;Y)  is  defined 
as  the  closure  of  the  following  region. 


C(K)sconvex-hull  U  C(Qt,Q2) 
Ql,Q2 


where  the  union  is  over  all  Qi  and  Q2  such  that  Qt  is  a  probability 
distribution  (p.d.)  on  X,  and  Q2  is  a  p.d.  on  X2;  and  C(Q„Q2)  is  defined  as 
the  set  of  points  (R1tR2)  such  that 

P<ti|*i,*2) 


0  s  R,  <  2  Qt<*i>  2  Qz^2>2  p(t*  I  - 

$i«Xi  $2eX2  n«Y  2  Q2^)P(t\  I M) 


&X2 


ai“>  -  ‘  -»  J  r  —l  ’  ■  '  -  ‘Ji'j’j  ^-l  -«  -»  J  -i  -.  j  j  *-»  *-*-  ■>.-« 
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o  i  Rj  <  2  2  Qj<W2  - 

ti«X,  i2«X2  n«Y  2  Qi(C)P(H  |  W 

C«X, 


P<H  I  <!.<*> 

Ri*r2<  2  «i«i)  I  Q2«2>I  «nl<i*)i" - 

t,«x,  ?2«x2  n«v  2Qt«t)2Q2CC)pfTt|Ci.C2) 

Cl«Xt  C2<eX2 


Theorem  1.2.1.  (Ahlswede  [2],  Liao  [3]) 

For  any  two-user  channel  K=(P;Xj,X2;Y)  and  any  pair  of  real  numbers  R, 
and  R2,  we  have: 

I)  If  (R„R2)€COO,  then,  for  any  e>0,  there  exists  a  (M1fn2,k)  block  code  f 
such  that  Pe(f)<«  and  (l/kJlnU^R^,  1*1,2. 

II)  If  (R1fR2)  lies  outside  C(K),  then  P8(f,g)  is  bounded  away  from  2°ro 

for  all  f  and  g,  so  long  as  (Mt,M2,k),  the  parameter  of  f,  is  such  that 
(l/k)lnMiiR1,  Is  1,2.  □ 

In  words,  Theorem  1.2.1  states  that,  for  any  channel  K,  i)  communication 
with  arbitrarily  low  probability  of  error  is  possible  if  the  source  rates 
lie  in  COO,  and  11)  probability  of  error  can  not  be  mace  arbitrarily  small 
(l.e.,  reliable  communication  is  not  possible)  if  the  source  rates  lie 
outside  COO.  The  theorem  does  not  assert  anything  about  points  which 
belong  to  COO  but  not  to  COO. 

Example  1.2.1. 

To  illustrate  the  capacity  theorem  and  to  explain  certain  approaches  to 
multi-access  communications,  we  now  discuss  the  two-user  erasure 
channel  (TEC)  of  Figure  1.2.1.  We  observe  from  the  figure  and  by  the 
channel  capacity  theorem  that  sum  rates,  R,+R2,  of  up  to  1.5  bits  are 
achievable  (with  arbitrarily  small  probability  of  error)  by  using  block 


Figure  1.2.1.  Two-user  erasure  channel  and  its  capacity  region. 


Let  us  look  at  some  simple  block  codes  for  this  channel.  It  is  easy  to  see 
that  the  following  code  achieves  the  rate  pair  (0.5  bits,  0.5  bits)  with 
zero  probability  of  error. 
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Code  i. 


v.v’v 


Message  Codeword 
1  00 

2  01 


Message  Codeword 
1  00 

2  10 


In  this  code,  the  first  user  sends  no  information  in  the  first  digit  of  a 
codeword  (it  always  transmits  a  0);  similarly,  the  second  user  is  ’quiet* 
in  the  second  digit  of  each  codeword.  For  this  reason,  this  code  is  said 
to  have  no  multi-user  interference:  user  rs  message  can  he  estimated 
independently  of  user  2's  message  without  any  loss  of  optimality.  Thus, 
elimination  of  multi-user  interference  simplifies  decoding,  but  codes 
without  multi-user  interference  are  limited  to  sum  rates  of  at  most  1 
bit  In  the  case  of  the  TEC,  which  is  significantly  below  the  theoretically 
possible  1.5  bits. 


v-y-3 


Code!  is  typical  of  a  class  of  straightforward  approaches  to  multiple 
access  communications,  such  as  time  division  multiplexing,  frequency 
division  multiplexing,  and  the  like,  which  are  based  on  the  idea  of 
splitting  the  channel  into  non-interfering  subchannels  and  giving  the  use 
of  each  subchannel  exclusively  to  a  single  user.  The  main  advantage  of 
these  approaches  is  the  ease  of  decoding,  but  as  here,  their  operation  is 
often  restricted  to  a  small  portion  of  the  capacity  region.  Coding  for 
multiple  access  channels  aims,  at  the  very  least,  at  finding  practical 
techniques  for  achieving  rates  beyond  what  is  achievable  by  such  simple 
schemes. 
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One  can  easily  improve  upon  Code  1;  for  example,  Kasami  and  Lin  [4]  give 
the  following  code,  which  achieves  a  sum  rate  of  0.5+(l/2)log235j1.3 
bits. 

Code  2. 


User  1 

Message  Codeword 
1  00 

2  11 


Message  Codeword 
1  01 

2  10 

3  1 1 


In  this  cods,  unlike  the  previous  one,  Doth  users  transmit  information  in 

both  digits  of  each  codeword;  as  a  result,  each  received  digit  is 
corrupted  by  multi-user  interference.  Hence,  if  optimality  is  desired,  the 
decoder  must  deal  with  the  codes  of  both  users  simultaneously.  So,  an 
increase  in  the  rates  comes  at  the  cost  of  increased  decoding 
complexity.  As  a  general  rule,  allowing  the  users  to  interfere  with  each 
other  requires  untangling  a  more  complicated  set  of  possibilities  at  the 
decoder,  hence,  an  increased  decoding  complexity. 

If  we  wish  to  communicate  at  still  higher  sum  rates,  and  at  the  same 
time  keep  the  probability  of  error  below  a  given  level,  we  find  out  that 
codes  with  longer  block  lengths  must  be  considered.  The  channel  capacity 
theorem  does  not  tell  us  how  large  the  block  length  has  to  be  before  we 
can  be  sure  that  there  exists  a  block  code  with  that  block  length  which 
satisfies  our  rate  and  reliability  requirements;  the  following  theorem 
provides  an  answer  to  this  question. 

Theorem  1.2.2.  (Slepian  and  wolf  (51) 

For  any  two-user  channel  K,  there  exists  a  function  EK(R1,P.2)  which  has 

the  following  properties.  I)  EK(R„R2)  is  positive  if  (R1fR2)eC(K)  and  zero 

otherwise.  2)  For  any  (Rj,R2),  there  exists  a  block  code  f  with  parameter 
(M„M2,k)  such  that  a)  (l/kJlnM^R^  for  1=1,2  and  b)  Pe(f)s exp-k£K(Rt,R2). 


For  the  purposes  of  our  discussion,  the  *<plicit  form  of  E^vrt;,a2)  is  not 

important.  The  important  point  is  that,  for  any  given  rate  in  COO,  this 
theorem  establishes  the  possibility  of  making  the  probability  of  decoding 
error  at  that  rate  approach  zero  exponentially  by  increasing  the  block 
length.  This  suggests  a  favorable  trade-off  between  reliability  and 
system  complexity,  as  long  as  the  desired  rate  is  in  COO.  A  more 
complete  discussion  of  this  issue  lies  outside  the  scope  of  this  thesis. 
For  that  the  interested  reader  is  referred  to  [6],  which  covers  all  the 
material  given  up  to  here  in  greater  detail  and  from  a  broader 
perspective,  and  also  gives  an  overview  of  several  approaches  to  coding 
for  multiple  access  channels,  which  we  will  not  discuss  at  all. 


1.3.  Multi -Usar  Tree  Codes 


A  multi -user  tree  code  is  simply  another  name  for  the  joint  encoding 
operation  described  in  $1.1.  The  name  derives  from  the  fact  that  the 
mapping  generated  by  causal  encoders  with  long  memory  is  most  easily 
visualized  as  a  tree.  This  section  starts  by  considering  a  single-user 
tree  code  to  introduce  the  basic  terminology  and  concepts;  then  a< 
two-user  tree  code  Is  considered;  next  the  form  of  the  concepts  and  the 
notation  for  an  arbitrary  number  of  users  is  indicated;  and,  finally, 
random  tree  code  ensembles  are  introduced. 


As  in  the  case  of  encoders,  a  single-user  tree  code  with  parameter  (M,k) 
has  an  input  alphabet  of  size  h  and,  for  each  source  digit  accepted,  it 
generates  k  channel  digits.  The  rate  of  such  a  tree  code  is  defined  as 
0/k)1nM  (nats)  or,  equivalently,  as  (1/k)log2n  (bits). 

As  an  example,  consider  a  (2,2)  tree  code  for  which  the  source  and  the 
channel  alphabets  are  both  equal  to  {0,1}  and  the  encoding  operation  a  is 
defined  as  follows. 

C  (u(1),u(l»  formal; 

e(u(..m))  =  < 

[  (u(m- 1 )+u(m),u(m))  for  m=2,3,... 

Here,  ♦  denotes  modulo  2  addition,  and  u  denotes  an  arbitrary  source 
sequence. 

The  first  three  levels  of  the  code  tree  for  e  are  shown  in  Figure  1.3.1. 
The  tree  representation  is  based  on  establishing  a  one-to-one  mapping 
from  source  sequences  to  paths  in  the  tree.  In  the  present  example,  the 
mapping  is  indicated  by  the  arrows  at  the  left  side  of  the  diagram.  In 
order  to  generate  the  encoded  sequence,  the  encoder  uses  the  source 
output  as  a  sequence  of  instructions  and  follows  the  "upper''  or  the 
’lower'  branch  going  out  from  the  current  node  depending  on  whether  the 
next  source  digit  is,  respectively,  a  0  or  a  I. 
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Figure  1.3.1.  Example  of  a  single-user  tree  code. 


For  example,  if  the  first  three  digits  of  the  source  output  are  0,1,0,  then 
the  first  three  blocks  (branches)  of  the  encoded  sequence  are  00,1 1,10. 
Thus,  each  source  sequence  is  mapped  to  a  unique  path.  Hence,  we  refer 
to  source  sequences  as  oaths  and  to  initial  segments  of  source  sequences 

as  nodes.  For  any  path  u,  and  any  m=l,2 . the  branch  connecting  node 

u(..m-i)  (for  m=l,  take  u(..m-l)  as  the  origin)  to  nods  u(..m)  is  labelled 
by  e(u(..m)). 

In  the  tree  representation  of  a  (M,k)  tree  code,  each  node  at  each  level  is 
connected  to  n  nodes  at  the  next  higher  level;  each  branch  is  labelled  by 
a  block  of  k  channel  input  digits;  M  is  referred  to  as  the  degree  of  the 
tree. 


The  path  corresponding  to  s,  the  actual  source  sequence,  is  called  the 
correct  path.  Nodes  on  the  correct  path  are  called  the  correct  nodes.  The 
branch  labels  on  the  correct  path  are  thus  the  channel  symbols  that  get 
transmitted  over  the  channel. 
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Two-User  Tree  Codes 

We  illustrate  the  relationship  between  a  pair  of  single-user  tree  codes, 
8]  and  82,  and  the  corresponding  joint  two-user  tree  code,  e,  by  using 
the  example  shown  in  Figure  1.3.2.  We  observe  that  the  parameters  of  e, 
and  e2  are  both  equal  to  (2,2).  In  general,  if  (Mlfk)  and  (t12,k)  are  the 
parameters  of  ei  and  e2,  then  (Mi,M2,k)  is  the  parameter  of  e.  So,  here, 
the  parameter  of  e  is  (2,2,2). 

With  reference  to  Figure  1.3.2,  observe  that,  for  each  pair  of  nodes, 
u,(..m)  in  8]  and  u2(..m)  in  e2,  u,(..m)xu2(..m)  is  a  node  in  8.  Likewise, 
for  each  pair  of  paths,  u,  in  8]  and  u2  in  s2,  u,xu2  is  a  path  in  e. 

The  path  s=sjxs2,  where  s1  is  the  correct  path  in  et  and  s2  is  the 
correct  path  in  e2,  is  called  the  joint  correct  oath,  or  the  correct  path  in 

e. 


Basic  Concents  and  Notation  for  Multi-User  Tree  Codes 

Genetically,  e^  denotes  the  tree  code  for  user  i,  and  e  denotes  the  joint 
tree  code.  (Mj,k)  denotes  the  parameter  of  e^;  n  denotes  the  number  of 
users;  and  (M1,...,Mn,k)  denotes  the  parameter  of  e.  The  of  e,  is 
defined  as  R1=(  1  /k)lnMj,  and  that  of  e  as  (R1,...,Rn). 

If  is  a  path  in  for  each  i<{l,...,n},  then  u^****^  is  a  path  in  e.  It  is 
called  the  product  oath  or  the  joint  oath  corresponding  to  u,,..„un.  ut  is 
said  to  be  a  component  oath  of  UjX...xun. 

The  path  fn  8|  corresponding  to  Sj,  the  actual  source  output,  is  called  the 
correct  oath  in  e,;  Si**-*xsn  is  called  the  correct  oath  in  e,  or  the  joint 
correct  oath.  Nodes  on  s1*...xsn  are  called  correct  nodes. 
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If  Uj(..m)  Is  a  node  In  ei  for  each  u{  lf...,n}t  then  u1(..m)x—xun(..m)  is  a 

node  in  e.  It  is  called  the  joint  node  or  the  product  node  corresponding  to 
U|(..m),....,un(..m).  Uf(_m)  is  said  to  be  a  component  node  of  u1(..m)x— x 

un(..m). 

For  any  pair  of  nodes  in  e,  u(..m)=u1(..m)*— xufl(..m)  and  u(..m)=u1(..m)x- 

•xun(..m),  the  iypi  of  u(..m)  with  respect  to  u(..m)  is  defined  as  the 

vector  (Ti,...,Tm)  where  Tj,  1iJ<m,  is  the  set  of  i  such  that  ui(..J)su1(..j). 

(For  example,  in  Figure  1.3.2,  the  type  of  node  ((1,1), (1,0))  with  respect 
to  ((1,1), (0,0))  is  (<j>, {1}).) 

For  any  node  u(..m)  and  any  path  u  in  e,  the  type  of  u(..m)  with  respect  to 
u  is  defined  as  the  type  of  u(..m)  with  respect  to  u(..m). 

For  any  path  u  in  e,  the  mtfl  (m>1)  incorrect  subtree  of  u,  denoted  by 
Im(u),  is  defined  as  the  set  of  nodes  u(..j)  in  e  such  that  a)  j>m,  b) 

u(..m)*u(..m),  and  c)  if  m>2,  u(..m- 1  )=u(..m- 1 ). 

The  number  of  types  of  nodes  at  level  m  equals  (m+Dn.  This  can  be  seen 
by  observing  that,  if  (T1f...,Tm)  is  the  type  of  a  node,  Tj  must  be  a  subset 

of  Th  for  all  h>j.  Thus,  for  each  user,  there  are  m+i  ways  that  that  user 

first  appears  (one  possibility  is  that  it  never  appears)  in  the  sequence  of 
sets  T  . . Tm. 


Ensembles  of  Tree  Codes 

We  end  this  section  by  introducing  a  certain  type  of  tree  code  ensembles, 
which  will  be  used  mainly  for  proving  theorems. 

For  any  parameter  (n,k),  any  channel  input  alphabet  X,  and  any  p.d.  Q  on 
X,  the  sinale-usar  tree  code  ensemble  Ens(M;k;X;Q)  is  a  set  of  tree  codes 
ft(M,k,X)  with  a  probability  measure  p  on  it.  ft(M,k,X)  is  the  set  of  all 
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(M,K)  tree  codes  with  channel  Input  alphabet  X.  u  is  a  measure  defined  on 
the  class  of  events  that  are  expressable  as  countable  unions  and 
Intersections  (the  a-algebra)  of  elementary  events  of  the  form 

E(u(..l  ),£)=  {  eeftfll.k.X):  e(u(..i))=SeXk} . 

E(u(..1),$)  is  the  set  of  tree  codes  in  SHM,k,X)  for  which  $  is  the  label  of 
the  branch  immediately  preceding  node  u(..i).  u  is  the  extension  measure 
corresponding  to  the  following  probability  assignment:  For  any  collection 
of  distinct  nodes  u1(..m1),...,ur(..mr)  and  any  ^...^eX1*, 

Pr  [  E(u ,  (..m ,  ),£ ,  ),....,E(ur(..mr),£r) }  =  Q(£,)—Q(£r). 

Thus,  the  statistical  properties  of  a  code  chosen  at  random  according  to 
U  coincides  with  those  of  a  (M,k)  tree  code  each  of  whose  branches  gets 
a  label  £,  &X*,  with  probability  Q(£)»  independently  of  what  is  assigned 
to  other  branches. 

For  any  n-user  parameter  (tt1,...,rfn,fc),  any  collection  of  channel  input 

alphabets  Xlf _ ^  and  any  collection  of  Qt,...,Qn,  wnere  Qj  is  a  p.d.  on 

Xj,  the  n-user  tree  code  ensemble  Ens(Mt,...,Mn;k;X1,...,Xn;Q1,...,Qn)  is 
defined  as  the  set  of  all  (M1f...,Mn,k)  tree  codes  for  which  Xt  is  user  i's 

channel  input  alhabet,  with  the  following  probability  measure  u  on  this 
set.  u  is  best  described  by  saying  that  it  is  the  measure  that  would 
exist  on  the  joint  tree  code  e  corresponding  to  a  collection  of  random, 
mutually  independent  tree  codes  8|,...,an,  where  e1  is  selected  according 

to  the  probability  measure  associated  with  Ens(Mj;k;Xf;Q{).  In  other 

words,  the  statistical  properties  of  a  code  chosen  at  random  according 
to  u  are  identical  to  those  of  a  Joint  tree  code  in  the  situation  where 
each  branch  of  each  user's  tree  code  is  labelled  independently  of  each 
other  branch,  in  such  a  way  that  Qj  is  the  p.d.  for  branch  labels  in  user 

i's  tree  code,  i=l,...,n. 


1.4.  Sequential  Decoding  for  nulti-user  Tree  Codes 

Sequantlal  decoding  is  a  decoding  algorithm  for  tree  codes  invented  by 
Wozencraft  [71,  and  later  developed  by  Fano  [81.  This  section  describes  the 
stack  algorithm,  a  version  of  sequential  decoding  due  to  Zigangirov  [9] 
and  Jellnek  [10],  and  defines  the  concept  of  achievability  for  sequential 
decoding.  Familiarity  with  sequential  decoding,  to  the  extent  that  it  is 
given  in  any  one  of  the  references  [1 1],  [12],  and  [13],  is  assumed. 


Sequential  decoding  is  a  tree  search  algorithm  for  finding  the  correct 
path  in  a  code  tree  based  on  the  information  available  from  the  received 
sequence.  The  algorithm  relies  on  what  is  called  a  metric  for  directing 
its  search.  The  metric  in  sequential  decoding  is  not  a  metric  in  the  usual 
mathematical  sense  of  the  word.  Ordinarily,  the  metric  is  intended  to  be 
a  function  that  measures  the  statistical  correlation  between  the  received 
sequence  and  the  hypothesized  transmitted  sequence. 

Formally,  a  metric  for  a  channel  Ks(P;X„...,Xn;Y)  and  a  (M1,...,Mn,k)  tree 

code  e  is  any  function  of  the  form 
00 

r  :  U  (Xt*"**Xn)^*Y^ - >  l-oo,+oo). 

h=1 

The  value  of  the  metric  at  a  node  u(..m)  for  a  received  sequence  y  is 
given  by  r(eu(..m),y(..m)),  where  the  notation  is  as  given  in  §1.1. 

It  is  important  to  note  that  r(eu(..m),y(..m))  does  not  depend  on  y(m+D, 
y(m+2),....,  the  portion  of  the  received  sequence  beyond  level  m.  This 
restriction  is  an  integral  part  of  sequential  decoding;  and  without  it, 
some  results  of  this  thesis  would  not  hold. 

Also  notice  that  the  metric  is  allowed  to  take  on  the  value  -oo.  As  will 
be  clear  soon,  this  makes  it  possible  to  rule  out  a  node  permanently  from 
further  consideration  when  there  is  no  doubt  that  it  is  incorrect. 


R 


Example  1.4.1.  The  Fano  Metric 

The  most  well-known  metric  for  sequential  decoding  is  the  Fano  metric, 
which  was  originally  introduced  by  Fano  for  single-user  channels  [81.  In 
the  case  of  an  n-user  channel  K=(P;X1t...,Xn;Y)  and  a  (M1>...,Mn,k)  tree  code 

e,  the  Fano  metric  takes  the  following  form. 

m  P(y(h)|eu(h)) 

r(euC.m),y(..m)) =  J  {in - kR }, 

htt  <*>(yCh)) 

n 

where  o>  is  a  p.d.  on  Yk  and  R=(1/k)2lnHj. 

1=1 

In  practice,  one  might  pick  e  at  random  according  to  the  probability 
measure  associated  with  an  ensemble  Ens(M1,...,fy;k;X1,...,XnjQ1,...,Qn)  and 

set 

i.sx,"  «nfX„k 

for  each  n<Yk. 

The  Fano  metric  Is  branchwise  additive;  that  is, 

r(eu(..m),y(..m))=r(eu(..m- 1  ),y(..m- 1  ))♦  S(eu(m),y(m)), 
P(y(m)|eu(m)) 

where  tf(eu(m),y(m))  =  1n - kR. 

u>(y(m)) 

Branchwise  additive  metrics  are  simoier  to  implement  and  easier  to 
analyze;  but  these  are  not  compelling  reasons  to  restrict  our  discussion 
to  this  class  of  metrics,  and  we  do  not  do  so. 


The  Stack  Algorithm 

There  are  two  well-known  versions  of  sequential  decoding,  namely,  the 


Fano  algorithm  and  the  stack  algorithm.  For  practical  purposes,  the  Fano 
algorithm  Is  preferable  since  It  requires  almost  no  storage.  However,  in 
this  thesis,  we  shall  consider  only  the  stack  algorithm,  mainly  because  it 
Is  much  simpler  to  describe  and  analyze.  Let  us  point  out  that  the  results 
of  our  analyses  hold  for  the  Fano  algorithm  without  any  essential 
changes. 

In  the  stack  algorithm,  there  is  a  list  of  nodes  in  which  nodes  are  ordered 
with  respect  to  their  metric  values.  This  list  is  referred  to  as  the  stack 
The  metric  values  of  the  nodes  in  the  stack  increase  towards  the  too  of 
the  stack.  Ties  between  the  metric  values  in  the  ordering  of  nodes  a  re 
broken  by  some  fixed  but  arbitrary  rule.  Each  step  of  the  stack  algorithm 
consists  of  deleting  the  node  at  the  stack-top  and  inserting  its 
Immediate  descendants  into  the  stack.  At  the  start  of  the  algorithm,  the 
origin  Is  the  only  node  in  the  stack,  and  it  has  a  metric  value  of  zero. 

In  practice,  all  tree  codes  are  truncated  at  some  finite  level,  and  the 
stack  algorithm  stops  as  soon  as  a  node  at  the  last  level  of  the  code  tree 
reaches  the  stack-top.  The  stack-top-node  is  then  taken  as  the  output  of 
the  sequential  decoder.  If  the  rate  is  sufficiently  small,  reliability  of  the 
decoder  output  can  be  improved  by  increasing  the  length  of  the  finite  tree 
code.  The  remarkable  point  about  sequential  decoding  is  the  possibility  of 
making  the  average  decoding  complexity  independent  of  the  length  of  the 
tree  code,  and  thus,  of  the  desired  level  of  reliability. 

The  following  definitions  formalize  the  concept  of  decoding  complexity. 

Definition  1.4.1.  A  Measure  of  Decoding  Comolexitu 

If  the  stack  algorithm  is  used,  with  T  as  its  metric,  in  decoding  a  tree 

code  e  over  a  channel  K,  then  Cj(K,e,r,s,y)  denotes  the  number  of  nodes  in 

Ij(s),  the  jtri  incorrect  subset  of  the  correct  path,  which  reach  the 

stack-top,  conditional  on  s  being  the  correct  path  and  y  being  the 
received  sequence. 


Cj{K,a,D  denotes  the  expected  value  of  Cj(K,e,r,s,y)  with  respect  to  the 
joint  p.d.  on  s  and  y.  That  is,  Cj(K,e,r)=EsEy  1 8SCj(K,e,r,s,y)  where  Es 
denotes  expectation  with  respect  to  the  p.d.  on  s  and  Ey  |  es  denotes 

expectation  with  respect  to  the  p.d.  on  y  conditional  on  es  being  the 
transmitted  sequence. 

For  each  L,  0L(K,e,D  is  defined  to  be  (Ct(K,e,r)+***+Cj_(K,e,r))/L  □ 

Observe  that  LD|>(Kfe,r)  is  an  upper  bound  on  the  expected  number  of 

nodes  which  reach  the  stack-top  before  the  algorithm  reaches  level  L  on 
the  correct  path  for  the  first  time.  Hence,  for  large  L,  can  be  taken  as 

an  approximate  measure  of  the  average  number  of  computations  for  the 
algorithm  to  move  one  step  along  the  correct  path.  These  considerations 
motivate  the  following  definition. 


Definition  1.4.2.  A  Criterion  of  Applicability 

A  point  R=(Rlf...,Rn)  is  said  to  be  an  achievable  rate  for  sequential 

decoding  on  a  channel  K=(P;X„...,Xn;Y)  if 

1)  Rj>0  for  each  f=1,...,n,  and 

2)  there  exists  a  finite  constant  A,  A=A(K,R),  such  that,  for  everu  L  there 
exist 

i)  a  code  e  with  rate  at  least  as  large  as  R 
and  ii)  a  metric  T 
such  that  DL(K,e,D<A. 

(Condition  1)  above  means  that,  if  (M1,...,Hn,k)  is  the  parameter  of  e,  then 

(1/k)1nM1iR1  for  each  i=1,...,n.) 

The  closure  of  the  set  of  all  such  R  is  called  the  achievable  rate  region 

of  sequential  decoding  and  is  denoted  by  RtfO.D 


The  above  definition  of  achievabillty  allows  e  and  r  to  depend  on  L.  Now, 
one  may  ask,  quite  justifiably,  why  the  definition  of  achievabillty  does 
not  read  as  follows. 


Definition  1.4.3. 

A  point  Rs(RlM..,Rn)  is  said  to  be  a  stronoly  achievable  rate  for  sequential 
decoding  on  a  channel  K=(P;X1f...,Xn;Y)  If 

1)  RjiO  for  each  i=l,...,n,  and 

2)  there  exists  a  finite  constant  AsA(K,R)  such  that  there  exist 

1)  a  code  e  with  rate  at  least  as  large  as  R 
and  11)  a  metric  r 
such  that  DL(K,e,r)<A  for  all  LD 


Unlike  Definition  1.4.2,  Definition  1.4.3  requires  that  e  and  r  be  chosen 
independently  of  L.  Clearly,  if  R  is  achievable  in  the  sense  of  Def.  1.4.3, 
then  R  is  also  achievable  in  the  sense  of  Def.  1.4.2. 


The  concept  of  achievability  used  in  the  literature  on  sequential  decoding 
coincides  with  that  of  Def.  1.4.2.  It  is  not  known  if  strong  achievability 
and  achievability  are  equivalent,  even  for  the  single-user  case.  (Resolving 
this  question  might  contribute  greatly  to  our  understanding  of  sequential 
decoding.)  Strong  achievability  is  not  used  anywhere  in  this  thesis  for  the 
following  reasons.  First,  despite  some  efforts,  we  have  not  been  able  to 
prove  that  any  non-trivial  rate  is  strongly  achievable.  Second,  for  finite 
tree  codes,  which  are  the  only  type  of  tree  codes  of  practical  interest, 
strong  achievability  is  unnecessarily  restrictive. 

To  illustrate  that  achievability  in  the  sense  of  Def.  1.4.2  is  sufficient  for 
practical  purposes,  consider  a  situation  where  the  desired  rate  and  the 
desired  level  of  reliability  are  given.  Suppose  that  the  desired  rate  is 
achievable.  Then,  given  any  L,  there  exists  an  infinite  tree  code  e  with 
the  desired  rate  and  a  metric  r  such  that  DL(K,e,D<A,  where  A  is  a 

finite  constant,  independent  of  e,  L,  and  T.  The  idea  is  to  pick  L  large 
enough  so  that,  among  those  code-metric  pairs  satisfying  DL(K,e,r)<A, 

there  exist  e  and  r  such  that:  When  the  stack  algorithm  is  applied,  with  r 
as  its  metric,  to  the  finite  tree  code  that  is  obtained  by  truncating  e  at 
level  L,  the  desired  reliability  is  also  satisfied.  A  ML  a  part  where 
no  branching  occurs,  may  be  appended  to  the  truncated  code  in  order  to 
increase  the  reliability  of  the  final  digits  of  the  decoded  sequence. 
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1.5.  Summary  of  Results 

The  research  reported  in  this  thesis  has  been  aimed  mainly  at  finding  a 
characterization  of  R,  the  achievable  rate  region  of  sequential  decoding. 
This  goal  has  not  been  achieved  and  no  general  characterization  of  R  is 
known  at  present;  there  are,  however,  some  partial  results,  which  we  now 
summarize. 

The  .Bfisuit  ■QnAcnifiy.aD.u.i.ty 

The  following  theorem  is  the  main  result  of  this  thesis  on  achievability. 
For  notatlonal  simplicity,  it  is  stated  here  for  the  two-user  case.  In 
Chapter  2,  it  is  restated  and  proved  for  an  arbitrary  number  of  users. 

Theorem  2.2.1. 

For  any  two-user  channel  K=(P;X1tX2;Y),  R(K)  is  inner-bounded  by  R0(K), 
which  is  defined  as  follows. 

R„0O  *  U 
Q 

where  the  union  is  over  all  Q=(Qi,Q2)  such  that  Q,  is  a  p.d.  on  Xtk  and  Q2 
is  a  p.d.  on  X2k  for  some  arbitrary  integer  k  (same  k  for  both  Qt  and  Q2); 
and  for  any  such  Q,  R0(K,Q)  is  defined  as  the  sat  of  all  (R^)  such  that 

O  i  Rt  i  -0/k)ln  2  {  I  Oi«i)/PFqhw55  }  • 

?2«X2k  n<Yk  4,iX,k 

0  s  R2  s  -(1/k)ln  l  Q,«,)  2  {  2  }2. 

i,sX,k  n«Yk  $2«x2k 

Rt*R2  i  -{1/k)Tn  2  {  2  2  Q,(«i)V  p(-n  I  } . 

H<Yk  i2<X2k  i,6X|k 


9  *  'j  V 
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This  theorem  is  proved  dy  showing  that  R0  is  achievable  dy  the  following 
class  of  metrics:  Members  of  the  class  are  identified  by  a  parameter 
(K,k,Q,B)  where  K  is  a  channel,  say  K=(P;Xt,X2;Y);  k  is  a  positive  integer; 
Q=(Qt.Q2)  where  Q,  is  a  p.d.  on  X,k  and  Q2  is  a  p.d.  on  X2k;  and 
8=(Blf92,B3)  is  what  is  called  the  fcias  function.  The  member  of  the  class 
with  parameter  (K,k,Q,S)  is  based  on  a  branch  metric 

:  (XtxX2)kxYk - >[-oo, ♦«), 

such  that,  for  each  neYk  and  $s$t*$2,  where  £f<Xfk,  $2fX2k, 


where 

V  P(H  |  i) 

2  In - —  — . . “  kB„ 

2  Q2(5)/p^nTwj 

fcx2k 

/pcnR) 

z  in - kS2,  and 

2  Q)(t)V  Pt-n|c,<s) 

tsx,k 

/P(H|«) 

=  ln - 1  ~  —  ~  '<E3. 

tl«x1k  t2«X2k 

Here,  P  is  the  transition  probability  of  K  over  blocks  of  length  k.  (We  use 
boldface  characters  to  indicate  quantities  relating  to  blocks.)  P(n|$j,0 
is  the  probability  that  is  received  given  that  user  1  transmits  and 
user  2  transmits 


A  full  intuitive  account  of  the  above  metric  cannot  be  given  at  this  point, 


because  the  form  of  the  metric  itself  is  closely  related  to  the  method  we 
use  in  |2.1  to  prove  that  a  given  rate  is  achievable. 


This  metric  is  the  only  metric  known  to  achieve  R0(K)  for  all  K.  Our 
efforts  to  show  that  the  Fano  metric  (or  simple  modifications  of  it) 
achieves  R0  have  not  been  successful.  In  view  of  this,  we  regard  the 
introduction  of  the  above  metric  as  a  major  contribution  of  this  thesis. 

Converse  Results 

Converse  arguments  aim  at  finding  outer  bounds  to  the  achievable  rate 
region  of  sequential  decoding.  The  main  converse  results  of  this  thesis 
are  as  follows. 

Theorem  3.2.1.  For  any  single-user  channel  K,  R(K)=R0(K).  □ 

For  single-user  channels,  Ro£K)=[0,R000]  (see  §2.3  or  pp.  149-50  of  1 1 21), 
where 

R0(K)  =  max  -In  2  {  2  Q(«5v'P(tiI«}  . 

Q  i\eY  £eX 

where  the  maximum  is  taken  over  ail  p.d.'s  Q  on  X. 

The  achievability  of  all  R,  for  R€[O,R0(K)),  is  a  special  case  of  Theorem 
2.2.1  and  it  has  been  well-known,  see,  e.g.,  Ill],  [12],  or  [13].  But  the 
converse  statement,  that  rates  greater  than  R0(K)  are  not  achievable,  is 
new  and  will  be  proved  in  §3.2. 

The  strongest  converse  prior  to  this  was  due  to  Jacobs  and  Berlekamp 
[14],  which  stated  that  rates  in  excess  of  £a(K,l)  are  not  achievable. 
Here,  £„(K,1)  is  the  value,  at  p=1,  of  E0(K,p),  which  Jacobs  and  Berlekamp 
defined  as  the  smallest  concave  function  greater  than  or  equal  to 

E0(K,p)  =  max  -In  2  {  I  Q($)  P(T\|^),/(,+P)}(1+p), 

Q  T\eY  £eX 

where  the  maximization  is  over  all  p.d.'s  Q  on  X. 
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Note  that  Eo(K,1)=R0(K);  hence,  our  result  is  an  improvement  over  that  of 
Jacobs  and  Berlekamp  only  for  channels  for  which  E0(K,1)<  £0(M)*  We  do 
not  have  an  example  for  which  E0(K,1)<£0(K,1),  but  we  believe  that  such 
channels  exist.  It  is  known,  for  example,  that  there  exists  K  for  which 
Efl(K,p)  is  not  a  concave  function  of  p  1 1 4];  for  any  such  K,  Eo(K,p)<£0(K,j>) 
at  some  ptQ. 

R0(K)  has  been  called  the  cut-off  rate  of  channel  K  with  the  understanding 
that  at  rates  above  R0(K)  the  average  complexity  of  sequential  decoding  is 
infinite.  The  above  theorem  justifies  the  use  of  this  term. 

Theorem  3.3.1.  R(K)  =  R0(K)  for  any  channel  K=(P;X„...,Xn;Y)  which  has 
the  property  that 

2  ✓P<-n  PtTl  |  e:„...,Cn}log  {p(n  |«„._«„)/P<H  I  c, . ?n)}=0 

for  8very  i=l,...,n.  □ 

Channels  with  the  above  property  are  called  pairwise  reversible  channels 
[161;  an  example  is  the  TEC  of  Figure  1.2.1. 

The  above  converses  determine  R  for  two  special  classes  of  channels. 
However,  R  remains  undetermined  in  the  general  case.  It  might  be  that 
R(K)  equals  Ra(K)  for  ail  K,  but  this  has  not  been  proved  yet,  except  in  an 
ensemble  average  sense  (see  Theorem  3.4.1).  No  examples  have  been  found 
for  which  R  is  strictly  larger  than  R0,  either. 

Non- Joint  Sequential  Decoding 

Chapter  4  considers  an  alternative  approach  to  sequential  decoding  and 
finds  an  inner  bound  to  its  (appropriately  defined)  achievable  rate  region. 
Non-joint  sequential  decoding,  as  this  approach  is  called,  uses  a  separate 
sequential  decoder  for  each  user;  the  decoder  for  a  given  user  decodes 
that  user's  message  without  any  knowledge  of  the  tree  codes  of  the 
remaining  users. 
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In  exchange  for  the  increase  in  the  number  of  decoders,  non-joint 
decoding  allows  each  decoder  to  be  much  simpler  than  a  joint  decoder.  It 
is  demonstrated  by  an  example  in  Chapter  4  that  non-joint  sequential 
decoding,  in  addition  to  being  simpler,  sometimes  achieves  rates  that  are 
unachievable  by  ordinary  sequential  decoding.  This  seemingly  paradoxical 
result  is  then  explained,  and  conclusions  are  drawn  about  the  nature  of 
achievability  in  sequential  decoding. 

This  completes  the  summary  of  the  main  results.  In  the  remaining  part  of 
this  section,  we  shall  consider  some  examples  and  try  to  answer  some 
specific  questions  about  sequential  decoding. 

Example  1.5.1. 

a)  Two-User  OR  Channel  (Figure  1.5.1) 

For  this  channel,  it  is  known  that  R=R0=C;  in  other  words,  the  achievable 
rate  region  of  sequential  decoding  coincides  with  the  capacity  region. 

One  particular  feature  of  the  OR  channel,  which  we  wish  to  discuss,  is 
that  it  is  noiseless:  that  is,  the  channel  output  is  completely  determined 
by  the  channel  inputs.  Noiseless  channels  are  pairwise  reversible.  Hence, 
by  Theorem  3.3.1,  R(K)=R0(K)  for  all  noiseless  K.  Furthermore,  for  anu 
noiseless  K,  one  can  achieve  R0(K)  by  simply  using  a  metric  that  has  only 
two  values,  namely  0  and  -«.  This  metric  assigns  0  to  consistent  nodes 
and  -oo  to  inconsistent  ones.  A  node  u(..j)  is  said  to  be  consistent  if  its 
correctness  can  not  be  ruled  out  on  the  basis  of  u(..j),  the  first  j  blocks 
of  the  received  sequence. 

b)  Two-User  Erasure  Channel  (TEC)  (Figure  1.5.2) 

This  is  another  noiseless  channel,  so  we  know  that  R0(TEC)=R(TEC).  The 
shaded  region  in  Figure  1.5.2  is  an  inner  bound  to  R0(TEC),  obtained  by 
computing  R0(TEC,Q)  for  Q=(Q,,Q2)  with  Qt=Q2=the  uniform  distribution 
on  {0,1}.  R0(TEC,Q)  is  not  equal  to  R0(TEC),  because  clearly,  the  points 
(0,1)  and  (1,0)  belong  to  R0(TEC).  So,  a  larger  inner  bound  to  R0(TEC)  can 
be  obtained  by  taking  the  convex-hull  of  the  union  of  the  shaded  region 
with  the  points  (0,1)  and  (1,0). 
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Figure  1.5.2  snows  that  sum  rates  of  up  to  1.42  bits  are  achievable  by 
sequential  decoding.  In  Example  1.2.1,  a  simple  block  code  achieving  a 
sum  rate  of  approximately  1.3  bits  was  given.  We  do  not  know,  however, 
of  any  comparably  simple  block  codes  which  achieve  sum  rates  as  high  as 
1.42  bits,  while  maintaining  an  arbitrarily  small  probability  of  error. 

c)  Two-User  Additive  Gaussian  Noise  Channel  (AGNC)  (Figure  1.5.3) 

This  Is  a  channel  with  non-discrete  Input  and  output  alphabets.  Our 
results  do  not  directly  apply  to  such  channels  since  we  are  considering 
only  discrete  channels.  Nevertheless,  the  AGNC  is  of  special  interest 
because  of  its  practical  relevance.  The  treatment  here  is  brief,  however; 
and  we  refer  to  [61  for  more  about  this  channel. 

The  channel  input  and  output  alphabets  for  AGNC's  are  the  set  of  real 
numbers.  If  T[,  £t,  and  $2  denote,  respectively,  the  received  number,  the 
number  transmitted  by  user  1,  and  the  number  transmitted  by  user  2,  then 
H-Si-$2  (the  noise)  is  a  random  variable  with  distribution  N(0,<72).  Here, 
N(0,a2)  is  the  Gaussian  density  function  with  mean  0  and  variance  a2. 
There  are  enerou  constraints  on  the  inputs  of  the  form:  E^,2)*^  and 
E($22)i€2,  where  E  denotes  expected  value  in  a  time  and  code  average 
sense.  (In  the  absence  of  energy  constraints,  the  capacity  region  and  the 
achievable  rate  region  of  sequential  decoding  are  unbounded.) 

Figure  1.5.3  shows  C(AGNC),  the  capacity  region,  and  an  inner  bound  to 
Rq(AGNC).  The  inner  bound  is  obtained  by  computing  R0(AGNC,q)  for 
q=(N(0,€1),N(0,62)).  The  computation  of  R0(AGNC,q)  is  carried  out  in  the 
same  way  as  for  discrete  channels,  except  that  sums  are  replaced  by 
integrals  and  probability  distributions  by  densities. 

Notice  that,  if  a2  is  fixed,  the  achievable  rate  region  of  sequential 
decoding  for  an  AGNC  with  constraints  E(£j2)<2e,  and  E($22)<2s2  is  at 
least  as  large  as  the  capacity  region  of  an  AGNC  with  constraints 
E(£t2)i€i  and  E(£22)<*2.  So,  at  the  expense  of  at  most  doubling  the 
"energy*,  we  can  achieve  all  points  in  the  capacity  region  of  a  given  AGNC 
by  sequential  decoding. 


Figure  1.5.3.  Additive  Gaussian  noise  channel 
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CQmtttemftataflUteQHrKs 

Hers  we  wish  to  discuss  Informally  some  questions  that  may  have  arisen 
up  to  this  point. 

Q.  What  makes  sequential  decoding  of  multi-user  tree  codes  a  different, 
If  not  a  more  difficult,  problem  than  sequential  decoding  of  one-user  tree 
codes? 

A.  The  complication  In  multi-user  sequential  decoding  is  due  to  the 
presence  of  different  types  of  incorrect  paths  which  have  markedly 
different  statistical  properties  in  relation  to  the  correct  path.  Despite 
this,  one  has  to  design  a  metric  that  distinguishes  the  correct  path  from 
these  various  types  of  incorrect  paths.  While  the  design  of  such  a  metric 
may  not  seem  to  be  a  problem  (because  the  correct  path  has  a  higher 
correlation  with  the  channel  output  sequence  than  any  other  path),  it  is 
not  at  all  clear  whether  such  additional  constraints  on  the  metric  do  not 
force  the  achievable  rate  region  of  sequential  decoding  to  be  much  too 
small  to  make  it  attractive. 

To  discuss  the  above  ideas  in  more  concrete  terms,  consider  a  two-user 
tree  code.  Let  s1  and  s2  be  the  correct  paths  for  users  1  and  2.  Sequential 
decoding  aims  at  finding  Si*s2  based  on  the  information  available  from 
the  received  sequence  y.  For  simplicity,  let  us  consider  only  the  incorrect 
paths  in  I1(s1*s2),  the  first  incorrect  subtree  of  the  correct  path.  There 
are  three  types  of  paths  in  I,(s1xs2):  1)  Totally  incorrect  paths  of  the 
form  ut*u2  where  u1ss1  and  u2*s2.  2)  Half  incorrect  paths  of  the  form 
u,*s2  where  u^s,.  3)  Half  incorrect  paths  of  the  form  stxu2  where  u2ss2. 

Paths  of  type  1  have  no  correlation  with  y;  hence,  they  are  relatively 
easy  to  detect  and  eliminate  from  further  search.  But  paths  of  types  2 
and  3  are  correlated  with  y.  This  is  precisely  the  point  where  multi-user 
sequential  decoding  differs  from  and  becomes  more  difficult  th3n 
single-user  sequential  decoding. 

Q.  Do  we  know  of  simpler  characterizations  of  the  regions  R  and  R0? 
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A.  In  general,  there  are  no  known  characterizations  of  the  regions  R  and 
R0  which  are  simpler  than  their  definitions.  Clearly,  the  definitions  of 
these  regions  do  not  immediately  suggest  any  algorithms  for  determining 
whether  a  given  point  belongs  to  these  regions. 

While  so  little  is  known  in  terms  of  computing  R  and  R0  in  general,  the 
situation  is  completely  solved  in  the  one-user  case.  For  any  one-user 
channel  K=(P;X;Y),  we  have 

R(K)=R0(iO=[0,sup  Rq(K,Q)1  , 

Q 

where  the  supremum  is  over  all  p.d.'s  Q  on  Xk  for  some  arbitrary  integer 
k,  and  for  any  such  Q, 

R0(K,Q)  *-(t/k)1n  2(2  Q(t>/P<il|t)}*. 

$sXk 

The  computation  of  R(K)  is  made  possible  by  Gallagers  parallel  channels 
theorem  (see  pp.  149-50  of  [121),  which  states  that  in  order  to  maximize 
R0(K,Q)  over  Q,  one  needs  to  consider  only  p.d.'s  over  X,  i.e., 

sup^R0(K,Q):Q  is  a  p.d.  on  Xk  for  some  integer  k] 

=  sup^R0(K,Q):Q  is  a  p.d.  on  x}. 

The  computation  of  R0(K):=sup{R0(K,Q):Q  is  a  p.d.  on  X}  is  facilitated  by 
the  following  necessary  and  sufficient  conditions  for  a  p.d.  Q  on  X  to 
maximize  R0(K,G)  (see  Theorem  5.5.5  in  (121): 

Zv'p<iiT?)2o<0-/pniRf»  l  { SotoTPCnTo}2  an 

H€Y  C«X  1)€Y  C«X 

with  equality  if  Q(£)>0. 

These  conditions  are  extremely  useful  in  verifying  whether  a  given  Q, 
which  may  have  been  guessed  on  the  basis  of  intuition,  does  indeed 
maximize  Ro(K,Q).  It  is  unfortunate  that  there  is  no  analogue  of  the 
parallel  channels  theorem  in  the  multi-user  case. 
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Q.  Are  RCO  and  R0GO  convex  regions  for  all  K? 

A.  It  Is  not  known  if  ROO  1$  convex  for  all  K.  (Note  that  one  may  not  need 
to  have  an  explicit  characterization  of  R(K)  to  prove  that  it  is  convex.) 

It  is  known  that  RqOO  is  convex  for  all  K.  The  convexity  of  R0  should  not 
be  attributed  to  the  possibility  of  time-sharing  between  a  number  of  tree 
codes  and  decoding  each  code  by  a  separate  sequential  decoder.  That 
argument  overlooks  the  fact  that  a  collection  of  sequential  decoders 
working  on  different  codes  is  not  equivalent  to  any  single  sequential 
decoder. 

The  convexity  of  Rq  can  still  be  explained  by  the  idea  of  time-sharing, 
however;  but  we  must  consider  time-sharing  within  a  code  as  opposed  to 
between  a  number  of  different  codes.  Time-sharing  within  a  code  is 
achieved  by  taking  the  branches  of  the  tree  code  long  enough  so  that 
conventional  time-sharing  can  in  effect  be  used  within  the  duration  of  a 
branch.  The  proof  of  convexity  of  Rfl,  along  with  several  other  of  its 
properties,  is  given  in  S2.3. 

Q.  How  well  does  the  metric  proposed  for  multi-user  sequential  decoding 
work  in  the  one-user  case  ? 

A.  The  achievable  rate  region  of  the  proposed  metric  coincides  with  R(:<) 
for  every  one-user  channel  K.  For  K=(F;X;Y5,  the  metric  with  parameter 
(K,k,Q,B)  is  given  as  follows. 

For  each  isX*,  i\€Yk, 

V  p<h  I « 

=  In -  ~  —  kB  . 

2  Q(5!/P(H|C) 

?sXk 

For  any  code  parameter  (M,k)  satisfying  (1/k)1nM<R0(K),  the  appropriate 
parameter  to  be  used  is  found  as  follows:  Q  is  taken  as  a  p.d.  on  Xk  such 
that  (1/k)1nM<R„(K,Q),  and  B  is  then  set  equal  to  {(1/k)1nM+R0(K,Q)}/2. 
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In  Chapter  2,  it  is  proven,  as  a  special  case  of  Theorem  2.2.1,  that  the 
above  metric  achieves  the  rate  (1/k)lnM.  It  thus  follows  that  all  rates  up 
to  R0(K)  are  achievable. 


Now,  compare  the  above  metric  with  the  Fano  metric,  which  is  given  by 

P<H  |  $) 

*F<tH>  =  in - kBF  , 

w(ii) 

and  which  also  achieves  all  rates  up  to  Rfl(K)  for  any  single-user  channel 
Ky  provided  that  u>  and  Bp  are  chosen  appropriately. 

Note  that  these  two  metrics  are  not  reducible  to  one  another;  that  is,  it 
is  not  possible,  in  general,  to  choose  the  parameters  of  these  metrics  so 
that  their  ratio  is  fixed. 

We  conjecture  that  the  following  metric,  which  contains  the  above  two  as 
special  cases,  also  achieves  all  rates  up  to  R0(!O  for  each  single-user 
channel  K  and  for  each  r,  0.5sril. 

p<n  |  tf 
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Chapter  2 


AN  INNER  BOUND  TO  THE  ACHIEVABLE  RATE  REGION  OF 
SEQUENTIAL  DECODING 

The  main  result  of  this  chapter  is  the  proof  that  R0(K)  (to  be  defined  in 
§2.2)  is  an  inner  bound  to  R(K)  for  any  multiple  access  channel  K. 

2.1.  Sufficient  Conditions  on  Acfttevability 

Let  K=(P;Xt,...,Xn;Y)  be  an  n-user  channel;  let  r  be  a  branchwise  additive 

metric  for  (n1f...,nn,k)  codes  for  K;  let  tf,  ^:(X1*—xxn)k - >  [-00,00), 

be  the  branch  metric  for  I*.  The  value  of  r  for  a  channel  input  x(..m)  and 
a  channel  output  y(..m)  is  thus  given  by 

m 

r(x(..m),y(..m))=2  ff(x(i),y(i)). 

1=1 

In  this  section,  we  wish  to  find  conditions  on  K,  (M„..,Mn,k),  and  "6 
which,  if  satisfied,  guarantee  that  the  point  R=(R1t...,Rn),  where 
RjsO/lOInMj,  is  achievable  in  the  sense  of  Definition  1.4.2.  we  fix  K, 
2nd  T  throughout  the  following  discussion,  and  suppress 
them  in  the  notation. 

Proving  that  R  is  achievable  requires  exhibiting  the  existence  of  a  code 
e,  with  rate  at  least  as  large  as  R,  for  which  DL(e)  is  uniformly  bounded. 

A  direct  approach  to  this  problem  is  not  feasible,  because  the 
computation  of  D^e)  is  hopelessly  complicated  for  any  non-degenerate 

code  e.  We  try  therefore  an  indirect  approach,  known  as  random-coding, 
which  is  based  on  the  fact  that  the  expected  value  of  a  random  variable 
upper-bounds  the  value  of  that  random  variable  3t  at  least  one  sample 
point.  Thus,  instead  of  a  fixed  code,  we  consider  an  ensemble  of  codes, 
and  evaluate  the  expected  value  of  D.  (e)  over  this  ensemble. 
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The  ensemble  we  use  hers  ts  E= Ens(nli..fnn;k;X , ...,Xn;Q !  ,..,Qn).  E  will  be 
fixed  throughout  the  following  analysis,  and  Ee  will  denote  expectation 
with  respect  to  the  probability  measure  associated  with  E. 

Now,  EtDL(e)  *  Et{C,(e)— CL(e)}/L 

« {EgCt(e)*-*E8CL(e)}/L.  (1) 

So,  EgO^fe)  can  be  upper-bounded  by  upper-bounding  EeCj(e)  for  each  i. 
EgC^eJsE^gEyiggqfeAy) 

8  EsEeEy  |  esc1(#»#»^*  (2) 

Here,  s  represents  the  source  sequence;  Es  stands  for  expectation  with 
respect  to  the  source  statistics;  Ey|8$  stands  for  expectation  with 

respect  to  the  probability  measure  on  the  channel  output  sequence  y 
conditional  on  es  being  the  channel  input  sequence. 

Changing  the  order  of  expectations  in  (2)  is  justified  by  the 
non-negativity  of  the  terms  involved.  (See.  e.g.,  page  147  of  (191.) 

(One  can  see  at  this  point  that  EeEy  |  esCj(8,s,y)  does  not  depend  on  s; 
hence,  in  (2),  Es  can  be  dropped,  and  s  can  be  replaced  by  ary  fixed 
source  output.  But  we  shall  carry  along  Es  in  the  following  argument.) 

E0C|(e)  will  be  upper-bounded  with  the  help  of  the  following  inequality. 

Lemma  2.1.1.  For  any  non-negative  t, 

Cj(e,s,y)  i  J  £  exp  t[r(eu(..j),y(..j))  -  r(es(..m),y(..m))J. 
u(..j)elj(s)  rmi 


(3) 


Proof.  A  node  u(..j)elj(s)  reaches  the  stack-top  only  if 


r(«u(..j),y(..j))  >  r(es(..m),y(..m))  for  some  nrui.  (4) 

If  (4)  is  not  satisfied,  s(..m)  has  precedence  over  u(..j)  in  reaching  the 
stack-top  for  each  m,  rmf.  So,  u(..j)elj(s)  reaches  the  stack-top  only  if 

1i  J  exp  t{r(eu(..j),y(..j))  -  r(es(..m),y(..m))}  foralltiO.  (5) 
nrui 

Note  that  the  right  hand  side  of  (5)  Is  positive  whether  or  not  u(..j) 
reaches  the  stack-top;  hence,  it  upper-bounds  the  indicator  function  of 
the  event  that  u(..j)  reaches  the  stack-top.  So,  by  summing  the  right 
hand  side  of  (5)  over  all  nodes  in  Ij(s),  we  obtain  the  claimed  upper 

bound  on  C^(e,s,y).  □ 


Hereafter,  suppose  that  t  is  a  fixed  positive  number.  Now,  from  (2)  and 
(3), 

EeCj(e)  i  Es  2  2  A(s,m,u(..j)),  (6) 

u(..J)€lj(s)  m>i 


where,  by  definition, 

A(s,m,u(..j))=EeEy  |  esexp  t{r(eu(..j),y(..j))  -  r(es(..m),u(..m))  j.  (7) 


For  any  u(..j)€lj(s), 

A(s,m,u(..j))  =  EeEy|esexpt{  2*(eu(h),y(h))  -  2*(es(h),y(h))}; 

i<h<  j  i<h<m 

thus,  if  j>m, 

A(s,m,u(..j))  = 

s  EeEy  |  esexp  t{  2  (3  (eu(h),y(h))  -  (es(h),y(h))]+2  S(eu(h),y(h))};  (8) 

8<  Ishsm  m<hij 
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and,  if  maj, 

A(s,m,u(..j))  = 

=  EgEy  |  es  exp  t  { £  [tf  (eu(h),y(h))  -  (es(h),y(h))]  -  2  *(es(h),y(h)) }.  (9) 

iihij  j<h<m 

Since  the  labels  on  branches  at  different  levels  are  independent  random 
variables  over  the  ensemble  under  consideration,  (8)  and  (9)  can  be 
rewritten  as  follows. 

For  any  u(..J)€lj(s),  if  i>m. 


A(s,m,u(..j))  = 

TT  Eexpt{tf(eu(h),y(h))-tf(es(h),y(h))}  TT  Eexpttf(eu(h),y(h));  (10) 

iihim  m<hij 

and,  if  m2j, 

A(s,m,u(..j))  = 

TT  Eexpt{tf(eu(h),y(h))-tf(es(h),y(h})}  TT  Eexpt$(es(h),y(h)),  (11) 

iihij  j<him 

where  the  symbol  E  has  been  used  as  an  abbreviation  for  EgEy  |  es . 

We  now  wish  to  find  an  explicit  expression  for  A(s,m,u(..j)).  Let  u(..j)  be 
a  fixed  node  in  It(s),  and  (T, . Tj)  be  the  type  of  u(..j)  with  respect  to  s. 

Now,  for  any  h€{l,...,j),  neYk,  $=$ix— x$n»  and  S=$ix**xCn»  where 
$r6Xrk,  $reXrk,  r=1,...,n,  the  probability  that  es(h)=$  3nd  eu(h)=C  and 
y(h)=i\  is  given  as  follows. 


*  •  H 


hr 


U-Jl 


t-r-S 


>TF  Qr«r)  TT  Qr(?r)lTx(?r=«r)  p(h  1 4). 

Ursn  r€Th  reTtf 
where  X  is  the  indicator  function. 

To  simplify  the  notation,  we  shall  write 

Q(£)  in  place  of  IT  Qr($r) , 

Ursn 

Q($T)  in  place  of  TTQr($r),  and 
reT 

x%=^t>  in  p1ace  °f  nx{tr=$r)- 

rtT 

In  this  notation,  (12)  can  be  rewritten  as  follows. 

Pr{es(h)=$,  eu(h)=£,  y(h)=T\}  =  1  & 

n  h  h 

Now,  EeEy  |  es  exp  -ttf(es(h),y(h)) 

s  II  Q(0  P(H  |  $)exp  -t 

H  * 

E8Ey|0S  exp  t{tf(eu(h),y(h))-tf(es(h),y(h))} 

s  1 1 1  Q<*>0«t)XKt#sMp^I<)8}<P 

„  li  It  k 

n  Z  < 

and  EeEy  |  esexp  ttf(eu(h),y(h)) 

=  121  <KOQ(t^X«^*Cr,p(lll^exP  t^)‘ 
t[  K  $  S 


We  see  that  the  left  hand  side  of  (14)  does  not  depend  on  h;  and  the  left 
hand  sides  of  (15)  and  (16)  depend  on  h  only  through  Th.  So,  we  define 

H  =  EeEy  1 0S  exp  -t  (es(h),y(h)), 

^Th^s£eEy|es  8XP  t{tf(eu(h)fy(h))-tf(es(h)fy(h))}f  and 

£(V  =  EeEy|es  exP  ttf(eu(h),y(h))). 

Now,  for  any  node  u(..j)€li(s)  with  type  (T1f...,Tj)  wrt  s,  (10)  and  (1 1)  can 
be  rewritten  as  follows. 


A(s,m,u(..j)) 


cr(T|)- 


■*0j) 


j>m; 


cf(Tp—cf(Tm)i\m‘j  ,  rmj. 


(17) 

(13) 


Observe  that  A(s,m,u(..j))  depends  on  u(..j)  only  through  the  type  of  u(..j) 
wrt  s.  So,  let  A(s,m,T)  denote  A(s,m,u(..j))  whenever  u(..j)  is  a  node  of 
type  T  wrt  s.  Letting  T(i,j)  be  the  set  of  types  for  level-j  nodes  in  I^s), 

(6)  can  be  rewritten  as  follows. 

00  00 

EgC^e)  i  Es  2  2  2  2  A(s,m,u(..j)) 

j=i  TeT(i,j)  u(..j):  m=i 

type  of  u(..j)=T 


00  00 

sEs2  2  N<T)  2  A(s,m,T), 

j=i  TYT(i,j)  m=i 


where  N(T)  denotes  the  number  of  nodes  of  type  T. 


Define  ft(T)=max{a(T),  £(T)}.  Now,  for  any  T=(T, . ,Tj)eT(i,j), 


A(s,m,T) 


adp—Qdj)  ,  j>m; 


(19) 


(20) 


,  m>j. 


(21) 


2  A(s,m,T)  i  2  Q(T1)».Qaj)  ♦  2  Q(T1)».flCrj)ilrn“J 

m=i  m=t  m=j 

00 

=  ft(T|) — SCTj)  ( j-i*  £  7\h  ) . 


For  any  non-empty  subset  T  of  let  M(T)  be  the  product  of  for 

ieT ;  if  T=$,  let  M(T)=1.  For  any  node  type  T=(T1,...,Tj),  let  MCT)=M(T1)- 
Note  that  net)  Is  an  upper  bound  on  N(T),  the  number  of  nodes  of 
type  T.  Also  note  that,  if  T=(Ti,...,Tj)eT(i,j),  then  M(T)=MCTi)*--M(T j); 
because  Tj,=<£  for  lshsi-1.  Define 

'f'=max{sKT)ri(T)  :  T  is  a  non-empty  subset  of  {1,..,n}j. 

Now,  by  (22), 

00  00 

N(T)  2  A(s,m,T)  i  M(T)  2  A(s,m,T)  (23) 


<  ( j-i*  2  Tlh  )• 

h=0 


Sy  (19)  and  (23H24), 


EeC,(e)  i  Es  2  2  j'i+  2  *nh )  <25) 

j=i  TeT(i,j)  h=0 

Noting  that  the  number  of  elements  in  T(i,j)  is  upper-bounded  by  (j-i+2)n 
(see  §  1 .3  for  this  upper  bound),  it  follows  from  (25)  that 

00  00 

EeCj(e)  iZs2  (j-i+2)n  ^"'(  J-1+  2  ) 


The  right  side  of  (26)  is  independent  of  i;  and,  it  converges  if  ¥<1  and 
-t\<1.  The  conclusion  of  this  discussion  can  now  be  stated  as  follows. 

Theorem  2.1.1.  Sufficient  Conditions  on  Achievability. 

Let  K=(P;Xi,.„,Xn;Y)  be  a  multiple  access  channel;  suppose  that  there 

exist  a  branch  metric  tf:(X,x***xxn)k — *  [-00,00),  an  ensemble 

E=Ens(M  |  ,..,Mn;k;X  |  ,..,X^;Q  |  »..»Qp)y 

and  a  positive  real  number  t  such  that 

i)  n(t,K,tf,E)<1, 

ii)  M(T) c(T,t,K,tf  ,E) <  1  for  each  non-empty  subset  T  of  {1 . n},  and 

iit)  M(T)£(T,t,Kfy,E)<l  for  each  non-empty  subset  T  of  U 

Then,  for  all  L, 

00 

E9DL(K,e,r)<  2  (j+2)n  ¥(t,M,E)j  (  j+t/(1-T\(t,K,tf,E))  <  00  , 
js0 

where  T  denotes  the  metric  based  on  tf.  *  0 

Thus,  if  K,  (Mt,...,Mn,k),  and  satisfy  the  conditions  of  the  above 
theorem  for  some  ensemble  E,  then  (Rt,...,Rn),  where  Rj=(1/k)1nMj, 
belongs  to  R(K),  the  achievable  rate  region  of  sequential  decoding. 

*  It  is  possible  to  prove  this  theorem  with  Ti(t,K,tf,E)<1  relaxed  to 
‘H(t,K,S,Ehl  by  following  Gallager's  proof  for  n=t  (see  App.  6B  of  (121). 


2.2.  The  Proposed  Metric  and  An  Inner  Bound  to  Its  Achievable 
Rate  Region 

This  section  considers  a  class  of  metrics  and  finds  an  inner  bound  to  its 
achievable  rate  region  by  using  Theorem  2.1.1.  Metrics  in  this  class  are 
parametrized  by  a  four-tuple  (K,k,Q,B)  where  K  is  a  multiple  access 
channel,  say  K=(P;X1,...,Xn;Y);  k  is  a  positive  integer;  Q=(Q1,..,Qn)  where 

Qj  Is  a  p.d.  on  Xjk,  i=l,...,n;  and  B  is  a  real-valued  function  of  non-empty 

subsets  of  {1,...,n}.  B(T)  is  called  the  bias  term  for  subset  T. 

The  metric  with  parameter  (K,k,Q,B),  denoted  by  met(K,k,Q,B),  is  a 
branchwise  additive  metric  based  on  the  following  branch  metric 
For  each  neY*  and  $=$i****x$n  where  1 

s  min  ( l ) 

T 

where  the  minimum  is  over  all  non-empty  subsets  of  {1,...,n}  and 

/  P(H  [ K) 

In - kB(T) .  (2) 

2  F{^l^i}ler  >  ^iW} 

,ST 

In  (2),  the  summation  is  over  the  cartesian  product,  over  all  ieT,  of  X^; 
P(n|$)  is  the  transition  probability  of  the  channel  over  blocks  of  length 

k;  *  KjW)  is  the  probability  that  t\  is  received  given 

that  the  transmitted  block  at  input  i  equals  if  ieT  and  £j  if  ieTc. 


To  simplify  the  notation,  as  in  the  previous  section,  we  shall  denote 


TT  Qj(£j)  by  Q(*T),  and 

UT 

*  ^1*t}  by  I 

In  this  notation,  (2)  can  be  rewritten  as  follows. 

/P Tt\[T) 

tfT(*,*H)=1n - kBCT)  (3) 

2  Q(fcjO  4  P(n  |  4j«iCj) 

*T 

The  remainder  of  this  section  is  devoted  to  showing  that  R0(K),  which 
we  shall  define  next,  is  an  inner  bound  to  the  achievable  rate  region  of 
the  above  class  of  metrics  (hence,  an  inner  bound  to  R(K))  for  all  K. 

Definition  2.2.1. 

For  any  channel  K=(PjX1,...,Xn;Y),  any  Q=(Q1f...,Qn)  where  is  a  p.d.  on 
X^,  and  any  subset  T  of  {l,...,n},  we  define 

R0(K,Q,7)  =  -( 1/k)ln  \  Q<*-p>){  2  Q«t)V  P(H  |  V  f  and 

Kj 

Ro(K,Q)s{(Ri,.MRn)**OsR(T)iR0(K,Q,T)  for  each  subset  T  of  0,..,n}}. 

We  also  define 

R0(K,k)=UR0(K,Q), 

Q 

where  the  union  is  over  all  Q=(Q Qn)  such  that  Qt  is  3  p.d.  on  Xjk, 

00 

R0(K)=U  R0(K,k).  □ 
k=l 


and 
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R0(K)  will  be  shown  to  Do  an  inner  bound  to  the  achievable  rate  region  or 
met(K,k,Q,B)  with  the  help  of  the  following  fact,  whtch  is  just  a  special 
case  of  Theorem  2.1.1  at  t=l  and  in  the  particular  way  E  Is  selected. 

lemma  2.2.1.  Sufficient  Conditions  on  Achievability  for  met(K,k,Q,B). 
For  any  channel  K*(P;X1,...Xn;Y)  and  any  (M1f...,Mn,k),  the  point  (Rt,...,Rn), 

where  RjsO/kjlnMj,  belongs  to  the  achievable  rate  region  of  met(K,k,Q,B) 

if  the  following  conditions  are  satisfied  by  ff,  the  branch  metric  for 
met(K,k,Q,3),  and  the  ensemble  E=Ens(M1,...,Mn;k;X1,...,Xn{Q1,...,Qn),  where 

Q1(...,Qn  are  such  that  (Q1f...,Qn)=Q. 

1) 'T\(1,M,E)«1, 

2)  M(T)cr(T,1,K,tf,E)<  1  for  each  non-empty  subset  T  of  {1,...,n},  and 

3)  H(T)$(T,1,K,tf,£)< l  for  ®ach  non-empty  subset  T  of  {!,... ,n}.  □ 

Note  that  in  the  above  lemma  the  distributions  parametrizing  E  and  the 
metric  are  identical.  Of  course,  the  statement  of  the  lemma  would  still 
hold  if  this  were  not  so,  but  this  less  general  form  is  sufficient  for  our 
purposes. 

In  order  to  restate  Lemma  2.2.1  in  a  simpler,  more  useful  way,  we  now 
find  upper  bounds  on  nO.K.tf.E),  a(T,l  and  £(T,i,K,tf,E)  for  a 

fixed  collection  of  K,  3,  and  E,  where  E  and  3re  parametrized  by  the 
same  Q=(Q  QJ. 

H(1,K,S,E)  =  7  Q(£PCh!*)8 xp-*(*,ii) 


=  2  l  Q($)P(H |  <){Z  Q<CT) V  P<-«1 1  <TO, CT)  /VP(1||0  }exp  kB(T) 

T*+ 


*  2  2  «©  /p<H  |  i)  2  p(H  1  exp  kB(T) 

T*+  H,<  tT 

=  2  2  Q«r>  2  Q(«t)v"p(h  I  «r^T>  2  Q(?t>^P(1i  I  ^t1  8*p  “m 

H.tTe  ?T 

=  2  2 «<r){  2  «<TWTOn«  }2 «P nsfT) 

T*^  H>ti*  <T 

=  2  •xp-k{R0(K,Q,T)-BCT)}.  (4) 

For  notational  convenience,  in  the  following  tf(£p£p»,il)  and  will 

he  used  interchangeably. 

t>£x»4l 


i  2  QCCQ<?t  Wl  I  o  »*P  (*T«T.«r'1»)-s<<-1»i) 
t-tr-n 

yp;nl?T,«r)  exp -s(t,n)  exp  -kB(i) 

=  2  Q(OQ«i)p<nl«) - 

t.tj.n  2  I  Mr  > 

*T 

2  Q«T>-/p<n  I  Mr> 

«T 


k-rjr^rn 

S'iS  I 

.’•>V ; 
v\iV. 


=  exp  -kB(T)  2  Qtt)P(T\  |  V 

t,T[ 


exp-tf(£,n) 


(5) 


*  exp-k8(T)  2  Q(*)Pdl|*)exp-$(*,il) 

&TI 

*  T|(  1  ,K,tf,E)exp  -kB(T). 

Finally, 

JCT,1.K,tf,E)  =  2  Q(*)QttT)P(il|*>exp  *(tj,Zjc,T[) 

i  2  I  #  8Xp 

_ _ _ _ 

/  P(n  j  5 j«) 

=  2  q(  m<Xj)p(^  |  o -  -  exp  -kB(T) 

2  Q(+T)/P(ll|+T,«-p) 

♦t 


=  exp  -kB(T) 


(6) 


It  follows  from  (4)-(6)  that  conditions  of  Lemma  2.2.1  are  satisfied  if 

2  exp-k{R0(K,Q,T)-B(T)}  <1,  (7) 

T*<)> 

M(T)exp  -kB(T)2  exp-k{R0(K,Q,S)-B(S)}  <  1  for  each  non-empty  T,  (8) 


and  n(T)exp  -kBCT)  <  1  for  each  non-empty  T. 


(9) 


We  notice  that  (8)  is  redundant  as  a  condition,  because  (8)  is  satisfied 
whenever  (?)  and  (9)  are  satisfied. 

We  can  therefore  express  Lemma  2.2.1  in  the  following  weaker  but  more 
readily  applicable  form. 

Lemma  2.2.2.  For  any  channel  K=(P;X,,...Xn;Y)  and  any  (n1,...,Mn,k),  the 
point  R=(Rt,...,Rn),  where  R1=(l/k)1nM1,  belongs  to  the  achievable  rate 
region  of  met(K,k,Q,B)  if 

1)  2  exp-k{R0(K,Q,T)-B(T)}  <1  and 
T*f 

2)  tt(T)exp  -kBCT)  <  1  for  each  non-empty  T.  □ 

Using  this  lemma  and  the  following  definition,  we  are  now  in  a  position 
to  give  an  inner  bound  to  the  achievable  rate  region  of  met(K,k,Q,3). 

Definition  2.2.2. 

For  any  channel  K=(P;X„...,Xn;Y),  any  M=(M„...,Mn,k),  and  any  Q=(Q.,...,Qn) 

where  Qt  is  a  p.d.  on  X^,  we  define 

5(K,M,Q)  =  min{R0(K,Q,T)-R(T)}, 

T 

where  the  minimum  is  taken  over  all  non-empty  subsets  of  U,...,nl,  and 
RCT)  is  defined  as  (1/k)lnM(T)  for  any  subset  T  of  { 1  ,..,n>.  □ 

Lemma  2.2.3.  Inner  Bound  to  Achievable  Rate  Region  of  met(K,k,Q,B). 

For  any  Ks(P;X1,...Xn;Y)  and  M=(M, . Mn,k),  the  point  R=(R1f...,Rn),  where 

Rj=(l/k)1nM|,  belongs  to  the  achievable  rate  region  of  met(K,k,Q,B)  if 

5(K,n,Q)>(2/k)ln(2n-l),  provided  that  the  bias  terms  are  selected  such 
that  B(T}= (R0(K,Q,T)+R(T))/2  for  each  T. 


Proof.  Supposa  that  $(K,n,Q)>(2/k)ln£2n-l).  It  sufficas  to  verify  that 
conditions  1)  and  2)  of  Lsmma  2.2.2  are  satisfied. 

1)  2  8xp-k{R0(K,Q,T)-B(T)} 

T*$ 

=  2  exp-k{R0<K,QJ)-<R0(K,Q/!>R(T))/2  } 

T*$ 

=  2  exp-k  |  (R0(KtQ,T)-R(T))/2  } 

T*$ 

i  2  8xp-k{$(K,M,Q)/2} 

T*$ 

=  (2n-1)  exp-k{$(K,M,Q)/2}  <  1. 

The  last  two  steps  follow  by  noting  that  2n-1  is  the  number  of 
non-empty  subsets  of  {i,..tn},  and  that  8(K,n,Q)  >  (2/k)ln(2n-i). 

2)  M(T)exp  -kS(T) 

=  M(T)  exp  -k  {(Ro(K,Q,T)  ♦  R(T))/2  } 

*  exp-k^(R0(K,Q,T)-R(T))/2}  <  1  for  all  non-empty  T, 
since  8(K,M,Q)>0.  □ 

Lemma  2.2.4.  For  all  K,  R0(K)  is  an  inner  bound  to  the  achievable  rate 
region  of  the  proposed  class  of  metrics. 

Proof.  In  view  of  Lemma  2.2.4,  it  suffices  to  prove  the  following 
statement:  For  any  channel  K=(P;X,,...fXn;Y)  and  any  point  R=(R,,...,Rn), 

suppose  that  there  exist  M=(M1,...,Mn,k)  and  Q=(Q Qn),  where  Q1  is  a 

p.d.  on  xA  such  that  ( l/k)lnn1  >Ri,  i=l,...,n,  and  5(k,n,Q)>0.  Then,  there 
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exist  H=CH1l...fHn,H)  and  U=(Ut,...,Un),  where  U1  is  a  p.d.  on  x^,  such  that 
(1/h)1nHf2Rf,  t=1,...,n,  and  5<K,H,U)>(2/h)ln(2n-l). 

Suppose  that  fl  and  Q  satisfy  the  hypothesis  of  the  above  statement.  Let 
U  be  such  that  is  the  mth  product  of  Qj,  1=  1 . n;  t.e,  ui  is  a  p.d.  on 

Ximk  such  that,  for  each  ($1,....,^fnj()€Ximk, 

1  »—»^2k^""^i^(m- 1  )k+ 1  »**”*£mk^‘ 

Let  H  be  such  that  H^=M^m  and  h=mk. 

It  is  easy  to  verify  that  R0(K,Q,T)=Ro(K.U,T)  for  all  T,  and  that  S(K,M,Q)  = 
$(K,H,U).  So,  by  simply  taking  m  large  enough,  we  can  satisfy  S(K,H,U)> 
(2/h)ln(2n-1).  □ 

As  a  corollary  to  Lemma  2.2.4,  we  have  the  main  result  of  this  chapter. 
Theorem  2.2.1.  R0(K)  is  an  inner  bound  to  R(K)  for  all  K.  □ 


l)  No  examples  are  known  for  which  R  is  strictly  larger  than  R0.  On  the 
other  hand,  it  is  not  known  if  R0(K)=R(K)  for  all  K.  In  the  next  chapter, 
it  will  be  shown  that  R0=R  for  single-user  channels  (see  §3.2)  and  also 
for  pairwise  reversible  channels  (see  §3.3). 


2)  At  this  point,  it  is  natural  to  ask  whether  there  exists  a  class  of 
metrics  which  satisfies  the  conditions  of  Theorem  2.1.1  over  a  set  of 
points  larger  than  R0.  §2.4  will  prove  that  there  is  no  such  class. 

3)  One 'might  also  ask  whether  the  metric  of  Example  1.4.1  (the  Fano 
metric)  satisfies  the  conditions  of  Theorem  2.1.1  over  all  (interior) 
points  of  R0(K)  for  all  K.  Assuming  that  the  parameters  of  the  metric  are 
set  in  the  way  suggested  in  Example  1.4.1,  the  answer  is  no.  A  simple 
counter-example  is  a  (pseudo)  two-user  channel  which  is  the  parallel 
combination  of  two  independent  binary  symmetric  channels.  By  choosing 
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the  crossover  probabilities  of  the  binary  symmetric  channels 
appropriately  (one  close  to  1/2,  the  other  close  to  0),  one  can  obtain  a 
situation  where  the  Fano  metric  has  a  positive  drift  (in  an  ensemble 
average  sense)  on  each  path  whose  component  path  for  the  less  noisy 
subchannel  is  correct. 

4)  The  proof  of  Lemma  2.2.4  suggests  a  method  for  finding  an 
appropriate  metric  in  any  given  situation.  Suppose,  for  example,  that 
K=(P;X1f...tXn;Y)  Is  the  channel  and  R=(R1t...,Rn)  is  the  desired  rate.  We 

first  try  to  find  Ms(M1,...,Mn,k)  and  Q=(Q1,...,Qn),  where  Qj  is  a  p.d.  on 

X^,  such  that  (1/k)lnl1j2Rf  and  $(K,M,Q)>(2/k)ln(2n-l).  Supposing  that 

such  a  pair  is  found,  then  the  metric  met(K,k,Q,B),  with  bias 
B(T)=(R0(K,Q,T)+R(T))/2  for  each  T,  is  an  appropriate  metric  for  this 
situation.  If  we  decide  to  use  this  metric,  then  we  may  select  the  tree 
code  at  random  according  to  the  probability  measure  associated  with  the 
ensemble  Ens(Mtf...,Mn;k;Xlf...fXn;Q,,...,Qn).  There  is  no  guarantee  that 

such  a  randomly  selected  code  will  perform  satisfactorily;  but  the 
probability  that  its  performance  is  much  worse  than  average  is  small. 

5)  If  the  stack  algorithm  is  applied  to  a  tree  code  with  parameter 
n=(M„...,nn,k),  each  step  of  the  algorithm  requires  the  evaluation  of  the 

metric  values  of  nodes.  Ordinarily,  one  is  given  a  desired  rate 

R=(Rt,...,Rn)  and  the  code  parameter  11=(Mt,...,Mn,k)  is  chosen  so  that 

(1/k)lnM1>R1  is  satisfied  for  each  ie{1,...,n}.  From  the  viewpoint  of 

computational  complexity,  It  is  thus  preferable  to  select  M  so  that  k  is 
the  minimum  possible  subject  to  the  rate  constraints. 

If  we  wish  to  use  met(K,k,Q,B),  with  bias  B(T)  =  (R0(K,Q,T)+R(T))/2  for 
each  T,  there  is  an  additional  constraint  that  M  has  to  meet,  namely, 
5(K,M,Q)>(2/k)ln(2n-l).  This  constraint  is  unpleasant  because  it  forces  k 
to  get  large  as  the  desired  rate  approaches  the  boundary  of  R0(K).  It  is 
not  known  at  present  whether  a  constraint  of  this  type  is  inherent  in 
multi-user  sequential  decoding  or  whether  one  can  find  metrics  which  do 
not  suffer  from  this  problem. 


2.3.  Some  Properties  of  R0 


This  section  summarizes  some  of  what  Is  known  about  the  R0  region. 

In  Si. 5,  it  was  shown  that  Ro(K)=R0(K,1)  for  any  single-user  channel  K. 
In  the  case  of  multi-user  channels,  however,  this  is  no  longer  true;  there 
are  channels  for  which  R0(K)*R0(K,  1 ).  An  example  is  the  two-user  M-ary 
collision  channel  Ks(P;Xt,X2;Y),  where  tt  is  an  integer  greater  than  2, 
X1sX2={Oi1,...,H-l},  Y={e,0,l,...lfl-l}1  and  the  transition  probabilities  are 
as  follows.  P(x,  |  x,,0)=P(x2 1 0,x2)=  1  for  each  x,eX,  and  x2eX2; 

P(e|x1fx2)s1  if  x1£{  1, _ ,M- 1 }  and  x2s{  1  1 };  and,  all  other  transitions 

have  zero  probability.  We  leave  it  to  the  reader  to  verify  that  the  point 
((1/2)lnM  nats,  (1/2)lnM  nats)  belongs  to  R0(K,2)  but  not  to  R0(K,1). 

By  considering  collision  channels  with  larger  numbers  of  users,  it  can  be 
seen  that,  for  any  fixed  m,  there  exists  a  channel  K  for  which  R0(K)s 
R0«,!)U-UR0(K,m). 

R0  is  convex.  This  is  a  simple  result  of  admitting  probability 
distributions  over  blocks  of  arbitrary  length  in  the  definition  of  R0.  The 
convexity  of  R0  can  be  proved  by  observing  that,  for  any  pair,  Q1  and  Q2, 
of  vectors  of  p.d.'s  over  block-lengths  kj  and  k2,  and  for  any  pair  of 
integers  m1  and  m2,  the  vector  of  p.d.'s  Q,  defined  as  Q=Q1k2miQ2kim2f 
satisfies  (m1*m2)R0(Q,T)=m1R0(Q1,T)  +  m2R0(Q2,T)  for  all  T.  Here,  the 
components  of  Q^i  are  k2m, -order  product  forms  of  the  corresponding 
components  of  Q.,  and  similarly  for  Q^i^.  The  components  of  Q  are 
product  forms  of  the  corresponding  components  of  Q^i  and  Q2kim2.  The 
components  of  Q^i  and  Q2ktm2  are  thus  p.d.'s  over  block-lengths  of 
k1k2m1  and  k,k2m2,  respectively;  and  the  components  of  Q  are  p.d.'s  over 
a  block-length  of  k,k2(m1^m2). 

For  any  given  m,  there  exists  a  channel  K  (e.g.,  a  collision  channel)  for 
which  R0(K,m)  is  not  convex.  It  is  not  known,  however,  if  there  exists  K 
such  that  R0(K)sconvex-hullR0(K,l).  If  there  were  no  such  channel,  then 
we  would  have  a  characterization  of  Rq  similar  to  that  for  the  capacity 


mi 


By  using  tne  parallel  channels  theorem  (pp.  149- 150,  (121),  it  can  be 
proved  that,  for  any  K,  1,  and  m, 

max{fy:  (0,..,Rj,..,0)e convex-hull  R0(K,1)}  = 

max { Ry :  (0,..,R^ .,0) €  convex-hul  1  R0(K,m)} . 

This  can  be  seen  directly  by  noting  that,  if  all  users,  except  for  user  i, 
are  constrained  to  transmit  at  rate  zero  (which  means  that  each  such 
user  transmits  a  fixed  sequence),  then  the  situation  reduces  to  the 
single-user  case,  for  which  we  know  that  the  stated  result  holds.  This 
result  is  useful  in  that  it  provides  some  information  about  the  relative 
sizes  of  the  regions  R0(K,m),  m=1,2,... 

We  now  prove  some  inequalities  about  the  R0  region. 

For  any  K,  Q,  S,  and  T,  if  T  is  a  subset  of  S,  then 


R0(K,Q,T)<R0(K,Q,S).  (1) 

Proof.  Let  m  be  the  block-length  for  Q.  Now, 
mR0(X,Q,S)  =  -In  2  Q(i3d  {]>  Q(i5)/P(n|*)}2 

=  'In  2  {Z  |  O}2  (2) 

^5\T  ^T 

2  -In  2  Z  Q($5\7>  { Z  Q«t)/P(ii|<)  }2  (3) 

*5\T  ^7 

=  “in  2  Q(^){ZQ^7^'PfnT^}2 


=  mR0(K,Q,T), 


where  (3)  follows  from  (2)  by  Jensen's  inequality: 
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<T  <T 

In  the  proof  of  (I),  if  we  replace  T  by  the  empty  set,  we  obtain  the  proof 
of  another  basic  fact,  namely,  R0(K,Q,S)>0  for  all  K,  Q,  and  non-empty  S. 

For  any  subset  of  users  T,  let  P(i\  |  $T)  =  £  Q($yc)P(*n  |  $). 

Pdllij)  is  the  transition  probability  that  would  be  observed  between 

the  users  in  set  T  and  the  receiver  if  the  users  in  set  T*  collectively 
transmitted  a  given  symbol  £ps  with  probability  Q(£yc).  If  one  is  only 

interested  in  decoding  the  messages  of  the  users  in  a  set  T,  then  one 
may  model  the  remaining  users  as  noise  sources  and  thus  obtain  a 
reduced  channel.  Such  schemes  will  be  the  subject  of  Chapter  4.  The 
following  inequality  is  of  interest  in  comparing  the  achievable  rates  for 
the  reduced  channel  with  those  for  the  original  one. 

For  any  K,  Q,  and  T, 

-in  Z{2 Q(*-)/P<H|$t)}2  *  rnR„(K,Q,T), 

V  *7 


where  m  is  the  block-lencth  for  Q. 


Proof. 

mR0(K,Q,T)  =  -In  2  Q(ir){^Q(47)/P(n|  «T>  }2 

lUr  <r 

=  -in  2  2  {  lQ(«T)-/Q(«r)P<1lU!}2  (5) 

$yc  £y 

i  -In  I  {  lQ(V)p(^!^}2  (6) 


where  (6)  follows  from  (5)  by  the  following  inequality. 


2  { 2<wT)/Q(?r)P(T||i)}2 1  {2  Q«r>  ySo^-piPful?)}2  (?) 

$T« 

(7)  is  proved  by  using  Minkowsky's  inequality  (see  inequality  h  on  p.524 
in  (121),  which  states  that,  for  any  collection  of  non-negative  real 
numbers  {ajk}  and  any  p.d.  {Qj}, 

2{l 

k  j  j  k 

In  a  sense,  this  inequality  confirms  the  obvious  fact  that  codebook 
knowledge  of  all  users  can  be  used  to  improve  the  achievable  rate  region 
in  sequential  decoding. 


2.4.  A  Result  on  the  Method  of  $2.1 

In  this  section  we  prove  that  there  is  no  branchwise  additive  metric 
which  satisfies  the  sufficient  conditions  on  achievability  of  Theorem 
2.1.1  at  any  given  point  outside  R0.  This  means  that,  if  there  is  an 
achievable  point  outside  R<>,  the  achievability  of  that  point  cannot  be 
shown  by  using  Theorem  2.1.1.  This,  of  course,  does  not  mean  that  R0 
equals  R,  the  achievable  rate  region  of  sequential  decoding.  Thus,  the 
results  of  this  section  are  not  directly  related  to  sequential  decoding, 
but  rather  to  the  limitations  of  the  particular  method  of  §2.1  in  terms  of 
proving  achievability. 

The  above  result  is  proved  in  two  steps.  First,  Theorem  2.4.1  gives  an 
outer  bound,  for  any  given  metric,  to  the  rate  region  where  the  ensemble 
average  of  decoding  complexity  is  finite.  Then,  Lemma  2.4.1  shows  that 
R0  outer-bounds  the  outer  bound  of  Theorem  2.4.1  for  anu  given 
branchwise  additive  metric. 

Let  the  following  be  fixed  but  otherwise  completely  arbitrary  throughout 
this  section:  A  channel  K=<P;X1,...,Xn;Y),  a  code  parameter  M=(M1,...,  *V'» 

a  branchwise  additive  metric  r  which  can  be  used  in  decoding  codes  over 
K  with  parameter  M,  and  an  ensemble  EsEns(M1,...,Mn;k;X1,...,Xn;Q1 . Qn). 

We  define  DL  to  be  EeDL(Kfe,D  for  each  L,  where  Ee  denotes  expectation 

with  respect  to  the  probability  measure  associated  with  E.  We  also 
define,  as  usual,  Rj=(l/k)lnnif  i=l,...,n;  and  we  1st  3  denote  the  branch 

metric  for  V. 

Theorem  2.4.1.  If  inf{<7(T,t,K,tf,E):t>0}  >  exp-kR(T)  for  seme  non-empty 
subset  T  of  {l,...,n},  then  0L  increases  without  bound  as  l  increases. 

Lemma  2.4.1.  If  t20  and  T  is  a  non-empty  subset  of  {i,...,n},  then 


-ln<J(T,t,K,tf,E)ikR0(K,Q,T). 


Proof  of  Lemma  2.4.1.  By  definition, 


<J(T,t,K,S,E)  =2  <Mt)Q(*)p(i<  !  ©  exp  t(S(?T,tr,n)  ) 

i.Cj.'n 

=2  Q«t*>2  q«t)Q(«t)p(h  I V  exPt(»(tT,«T.,n)-»«T,<r,n) ).  (i) 

<T*  iT^T’1! 

Now, 

2  Q(tT)Q(iT)P(T\  | 

=  v" £  Q(5T)Q(|T)P(^|^T^Tc)expt(«(CTf^Tc,i\)-<J(^T^Tc,Ti)) 
Sy.S-T*1! 

■J  1  Q(?T)Q(?T-P'1l  I  Ml”1  exp  t(S(«T-«T'-‘,l>  >  (2) 

t-'t-.n 

i  2  q(?t)Q(«t)v'p(h  I  sT.*r>  p<h  I  Mr5-  (3) 

where  (2)  follows  by  reversing  the  roles  of  £-  and  5-,  anc  (3)  follows 
by  Cauchy's  inequality.  (For  arbitrary  non-negative  reals  a1f 
Cauchy's  Inequality  states  that  (Za}Zbt)1/2  >  s/aJ&J,  with  equality  iff, 
for  some  constant  c,  a^cbj  for  all  i.) 


Substituting  (3)  into  (1),  we  get 


£-p  Sy.tjiH 

=  exp  -kR0(K,Q,T),  which  is  the  desired  result. 


Proof  of  Theorem  2.4.1.  Let  the  nodes  at  level  L  be  labelled  by  integers 
where  ML  denotes  the  total  number  of  nodes  at  level  L.  Let  rk’ 

denote  the  value  of  the  metric  at  the  kth  node  on  the  path  to  levei-L 
node  I.  rlci  is  thus  a  random  variable  whose  distribution  is  determined  by 

the  source,  channel,  and  ensemble  statistics. 

For  any  pair  of  nodes  1  and  j  at  level  L,  let  us  define 
A(i,j)  =  the  event  that  min{rkMik<L}>min{r|(*:1<k<L}, 

8(i,j)  s  the  event  that  for  each  k,  UkiL,  and 

C(i,j)  =  the  event  that  rLj>rL^- 

Let  Pj  denote  probabilities  conditional  on  node  i  being  the  correct  node 
at  level  L. 

Theorem  2.4.1  follows  by  the  following  sequence  of  inequalities,  each  of 
which  is  justified  subsequently. 

nL  "L 

LDL2  2  C1/Ml)  2  (4) 

1=1  J*t 

nL  nL 

2  2  o/ml)  2  (5) 

j=i 


>  2  <i/(lml))  2  p^cd.j)) 

i=1  j  :  type  of  j  wrt  i=(T,...,T) 


(for  any  non-empty  T)  (6) 


>  ^  0/(LMl»  2  (c/-/I)(inf  {<7(T,t,K,lf,E):t>0})L 
i=t  j  :  type  of  j  wrt  i=(T  >•••>  T) 


i  exp  {k(L-l)R(T)}  (c/L3/2)  (inf{a(T,t,;<,tf,E):t>0})L. 


(3) 


Supposing  for  a  moment  that  (4)-(8)  hold,  it  immediately  follows  that, 
if  expkRCT)>inf{<7(T,t,K,tf,E):t>0}  for  some  T,  then  0^  goes  to  infinity  as 

L  increases.  So,  the  proof  will  be  complete  if  we  prove  (4H8). 

Proof  of  (4). 

If  there  exists  a  node  i  at  level  L  such  that  Pj(i  never  reaches  the 
stack-top)>0,  then  mDm=oo  for  all  rmL  So,  without  loss  of  generality, 
we  may  assume  that  P^l  never  reaches  the  stack-top)=0  for  each  node  i 
at  level  L  and  each  level  L. 

Let  i  be  the  correct  node  at  level  L.  If  A(i,j)  occurs,  then,  by  the 
properties  of  the  stack  algorithm,  i  cannot  reach  the  stack-top  before  j. 
But,  by  assumption,  i  reaches  the  stack-top  with  probability  one;  it 
follows  that  Pj(A(i,j))  is  a  lower  bound  to  the  probability  that  j  reaches 

the  stack-top  before  i,  conditional  on  i  being  correct.  Summing  over  j, 
we  obtain  a  lower  bound  to  the  expected  number  of  nodes  which  reach 
the  stack-top  before  i,  conditional  on  i  being  correct;  averaging  over  i, 
we  obtain  (4). 

Proof  of  (5). 

This  follows  by  the  fact  that  E(i,j)  is  a  subset  of  A(i,j).  To  see  this, 
suppose  that  B(i,j)  occurs;  in  other  words,  suppose  that  fcr  each 

k,  l<k<L.  Now,  by  taking  the  minimum  of  the  right  side,  we  obtain 
r^minU^1:  Until},  which  holds  for  each  k.  Taking  the  minimum  of  both 

sides  of  rkj>min{rm’:l<m<L}  over  k,  we  see  that  whenever  B(i,j)  occurs 
so  does  A(i,j);  hence,  8<i,j)  is  a  subset  of  A(i,j). 

Proof  of  (6). 

We  wish  to  prove  that,  for  any  two  nodes  i  and  j,  if  the  type  of  i  with 
respect  to  j  is  uniform,  i.e.,  if  it  equals  (T,...,T)  for  some  non-empty 
subset  T  of  { 1  ,....,n},  then  Pj(B(i,j))>(l/L)P|(C(i,j)).  We  do  this  with  the 

help  of  the  following  fact. 


Claim.  Ldt  be  iid  (independent,  identically-distributed)  random 

variables.  Let  C  be  the  event  that  Z^— +ZL>0.  Let  8  be  the  event  that 

m 

^  Zj  >  0  for  each  m,  1  <m<L. 
i=1 

Then,  P(8h(1/L)P(C). 

Proof  of  the  Claim.  Suppose  that  C  occurs;  that  is,  suppose  that  a  sample 
point  b>  occurs  such  that  Zt(u))-*>*"+Zl>(<i))>0.  Let  h  be  the  maximum  index 

such  that  Z  ,(«)♦—■ ♦Zh(u>)  =  min{Z!(<jo)+***+Zk(<u):1<k<L}.  Consider  the 
cyclic  permutation  Zf)+1(u>),...,Z|>(u)),Zt(<i>),...,Zh(u>);  observe  that  all 
partial  sums  for  this  permutation,  namely  Zh+|(w),  Zh+ 1  (<jo)+Zh+2^» 
and  so  on,  are  positive. 

So,  If  Zt(w)+,**+ZL(u>)>0,  then  there  exists  a  cyclic  permutation  for 

which  all  partial  sums  are  positive.  Since  there  are  L  cyclic 
permutations  and  since  each  permutation  (cyclic  or  non-cyclic)  of  a 
given  realization  is  equally  likely  to  occur,  the  claim  follows. 

The  proof  follows  by  substituting  (r^  -  rk_jj)  -  (rk’  -  rk_|*)  in  place 
of  Zj<  in  the  above  claim.  Notice  that  the  condition  that  j  be  of  tups 
(7,...,T)  with  respect  to  j  ensures  that  the  random  variables  (rki-rk.|j) 
-<rki-rk«1i),  k=i . L,  are  identically-distributed. 

Proof  of  (7). 

We  want  to  prove  that,  for  any  L,  any  non-empty  T,  and  any  pair  of  nodes 
i  and  j  at  level  L,  if  the  type  of  J  wrt  i  is  (T,...,T),  then 

Pj(C(i,j))  i(c/VT)  (inf  {cr(T,t,K,tf,E):t>0}  )K  (9) 


where  c  is  a  constant. 
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Let  Zk=  (rkJ  -  rk.|J)  -  (Tfc1  -  rk.,b  for  each  k=l . L.  Note  that  Z,,..,Zl 

are  fid  random  variables  with  a  moment  generating  function  <y(T,t,K,tf,E). 
Now,  we  have  Pj(C(t,J))2Pj(Zt****+ZL>0);  so,  C(i,J)  is  the  event  that  the 

sum  of  L  iid  random  variables  exceeds  zero. 

If  Zk  has  a  non-negative  expected  value  (this  corresponds  to  the 

situation  where  the  metric  tends  to  increase  on  a  branch  of  type  T  at 
least  as  fast  as  it  does  on  a  correct  branch),  then  Pj(C(i,j))>1/2  and 

inf{a(T,t,K,tf,E):t>0}=l;  so,  in  this  case,  (9)  is  easily  satisfied  by  taking, 
say,  c=l/2. 


^3 


So,  without  loss  of  generality,  we  may  assume  that  the  expected  value 
of  Zk  is  negative.  In  which  case,  (9)  follows  directly  from  the 

asymptotic  form  of  the  Chemoff  bound,  as  given  by  equations  5.4.23  and 
5.4.24  of  [12!. 


Proof  of  (8). 

(8)  follows  from  (7)  by  noting  that  expk(L-l)R(T)  is  a  lower  bound  on 

the  number  of  nodes  at  level  L  which  are  of  type  (T . T)  wrt  (any  given) 

level-l  node  i.  (Also  note  that  expkLRCO  is  larger  than  the  number  of 
nodes  in  question.)  This  completes  the  proof  of  Theorem  2.4.1. 
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Chapter  3 

OUTER  BOUNDS  TO  THE  ACHIEVABLE  RATE  REGION  OF 
SEQUENTIAL  DECODING 


3.1.  A  Basic  Lemma 
Definition  3.1.1. 

For  any  channel  K=(P;X,,...,Xn;Y)  and  any  block  code  f  over  K  with  block 

length  N  and  codewords  f(1) . f(M),  define 

M  M 

X(K,f)=(1/tt)  2  2  P(B(l,j)|f(0), 
i=1  j=l 

where,  for  each  i  and  j, 

f  {t\6Yn  :  P (i\  |  f(i))  <  ?(i\  |  f(j))}  if  isj, 

B(i,j)  =  < 

[  $  if  i=j.  □ 

X(K,f)  is  the  expected  number  of  incorrect  codewords  which  are  at  least 
as  likely  as  the  correct  codeword  conditional  on  the  received  word  t\, 
assuming  that  each  codeword  is  a  priori  equally  likely.  X(K,f)  will  be 
used  in  lower-bounding  the  expected  computation  in  sequential  decoding. 
The  link  between  block  codes  and  sequential  decoding  is  established  by 
Lemma  3.1.1,  which  will  be  given  after  developing  some  concepts. 

Definition  3.1.2. 

For  any  channel  K,  any  tree  code  e  over  K,  and  any  positive  integer  t, 
define  AtM.r.t)  as  the  expected  number  of  nodes  which  reach  the 
stack-top  before  the  correct  node  at  level  t,  assuming  that  the  stack 
algorithm  is  used  with  T  as  its  metric,  and  that  a  priori  each  path  is 
equally  likely  to  be  the  correct  one. 

For  any  tree  code  e  and  any  positive  integer  t,  let  e(t)  denote  the  block 
code  obtained  by  truncating  e  at  level  t.  □ 


For  the  purposes  of  this  chapter,  it  is  necessary  to  state  explicitly  the 
tie-breaking  rule  for  ordering  those  nodes  in  the  stack  which  have  equal 
metric  values.  The  rule  that  we  shall  use  is  based  on  the  following 
lexicographical  order  on  the  set  of  nodes. 

In  our  notation,  a  node  u(..j)  is  associated  with  a  vector  (u(l ,u(j)), 
where  each  u(h),  khij,  belongs  to  a  common  set,  say  S.  Any  ordering 
relation  on  the  elements  of  S  induces  a  lexicographical  order  on  the 
nodes:  For  any  pair  of  nodes  u(..j)  and  v(..h),  u(..j)  preceeds  v(..h)  iff,  for 
some  1,  Oiiij- 1 ,  u(..i)=v(..i)  and  u(i+l)  preceeds  v(i+!)  with  respect  to 
the  order  on  S. 

We  shall  assume  throughout  this  chapter  that  nodes  in  the  stack  with 
equal  metric  values  are  ordered  In  the  above  lexicographical  order.  Our 
Interest  in  the  details  of  the  tie-breaking  rule  is  for  purposes  of 
precision  (and  correctness)  in  the  following  proofs.  For  practical 
purposes,  any  tie-breaking  rule  should  be  as  good  as  any  other. 

Lemma  3.1.1.  A(K,e,r,t)  *  ( 1/2)  X(K,a(t)).  ( 1 ) 

Remark.  Observe  that  X(M(t))  is  the  expected  number  of  level-t  nodes 

which,  conditional  on  the  first  t  blocks  of  the  received  sequence,  appear 
at  least  as  likely  as  the  correct  node  at  level  t.  Lemma  3.1.1  thus 
implies  that  the  average  decoding  complexity  in  sequential  decoding 
would  be  minimized  if  the  stack  algorithm  were  able  to  explore  the 
nodes  at  any  given  level  t  in  the  same  order  as  they  are  ordered  with 
respect  to  their  a  posteriori  likelihoods  conditional  on  tne  first  t  blocks 
of  the  received  sequence.  Of  course,  no  sequential  decoder  can  actually 
do  this.  So  the  analysis  in  this  chapter  can  be  seen  as  an  attempt  to 
lower-bound  the  average  decoding  complexity  of  sequential  decoding  by 
that  of  an  optimum,  but  unrealizable,  sequential  decoder. 

Proof.  Let  K=(P;X1,...,Xn;Y)  be  a  channel  and  e  be  a  tree  code  for  K  with 
parameter  (M1,...,Mn,k).  Consider  the  situation  where  the  stack  algorithm 
Is  used  in  decoding  e  with  a  metric  r. 


Let  the  level-t  nodes  in  e  be  laoeiled  Oy  integers  l f — *M(t),  where  ii(t)  is 
the  total  number  of  nodes  at  level  t,  namely  M(t)=(M1*--Mn)t.  Let  e(t,i) 

denote  the  encoded  sequence  for  the  ith  level-t  node  in  e.  We  shall 
regard  e(t,i)  also  as  the  ith  codeword  of  e(t). 


Claim. 

M(t)  M(t) 

A(K,e,r,t)  i  (1/M(t))  2  2  P(A(i,j)|e(t,i)).  (2) 

1=1  j=l 

where,  by  definition,  for  each  pair  of  distinct  level-t  nodes  i  and  j, 

A(i,j)={i\eYkt:  i  cannot  reach  the  stack-top  before  j  given  that  t\  is  the 

first  t  blocks  of  the  received  sequence}; 

and  for  each  level-t  node  i,  A(i,i)=<^. 

The  definition  of  A(i,j)  would  not  be  meaningful  if  the  stack  algorithm 
(equipped  with  the  lexicographical  order  discussed  above)  did  not  have 
the  property  that,  given  any  two  nodes  at  level  t,  in  order  to  determine 
which  of  them  reaches  the  stack-top  first,  if  any  reaches  it  at  all,  we 
need  to  know  only  the  first  t  blocks  of  the  received  sequence.  In  other 
words,  given  a  node,  the  first  t  blocks  of  the  received  sequence,  in 
general,  do  not  tell  us  if  that  node  reaches  the  stack-top;  but  given  any 
two  nodes,  they  tell  which  of  the  nodes  cannot  reach  the  stack-top 
before  the  other. 

An  explicit  characterization  of  A(i,j)  can  be  given  as  follows.  For  any 
level-t  node  i,  let  minr(i,i\)  be  the  minimum  of  the  metric  values  of  the 
nodes  on  the  path  to  node  i,  given  that  is  received.  Now,  for  any 
two  distinct  level-t  nodes  i  and  j,  and  any  t^Y1^, 


T\€A(1,|) 


meAd.i) 


if  minr(j,n)>minr(1,n)  or  if  minr(j,i\)=minr(i,TO 

and  j  preceeds  1  with  respect  to  the  lexicographical  order; 

otherwise. 


Thus,  A(i,j)  and  A(j,i)  a rs  complementary  sets  (in  Y^),  a  fact  which  will 
he  used  in  what  follows. 

Proof  of  the  Claim. 

If  the  probability  that  the  correct  node  at  level  t  never  reaches  the 
stack-top  is  positive,  then  A(K,e,r,t)  is  infinite.  So,  without  loss  of 
generality,  we  may  assume  that  the  code  and  the  metric  are  such  that 
the  correct  node  at  level  t  reaches  the  stack-top  with  probability  one. 

Suppose  that  node  i  is  the  correct  node  at  level  t.  Let  j  be  some  other 
level-t  node.  Since  i,  being  the  correct  node,  reaches  the  stack-top  with 
certainty,  the  probability  that  j  reaches  the  stack-top  before  i  equals 
P(A(i,j)  |  e(t,i)).  Thus, 

lift) 

2  P(A(i,j)|e(t,i))  (3) 

is  the  expected  number  of  level-t  nodes  which  reach  the  stack-top 
before  node  i,  conditional  on  i  being  correct.  Averaging  (3)  over  i,  we 
obtain  (2),  thus  concluding  the  proof  of  the  claim. 

Now,  the  proof  of  Lemma  3.1.1  is  completed  as  follows. 


M(t)  M(t) 

2A(K,e,r,t)  1  (t/M(t))  2  2  r(A(i,j)|e(t,i)) 

1»l  J«t 


P(A(j,i)  e(t,j)) 


(4) 


M(t)  M(t) 

>  0/M(t»  2  l  2  min{P<T\!e(t,1)),P(n|e(t,i))}  (5) 

i=1  j=1  7\€Ykt 

j*i 


M(t)  M(t) 

i  (1/2M(t))  J  J  P(B(i,j)|e(t,i))  +  P(B(j,i)|e(t,j)) 


(6) 


M(t>  M(t) 

=  (1/M(t))  2  I  P(B(i,j)|e(t,0) 
i=l  J»1 

*  X(K,e(t)). 

Here,  (5)  follows  from  (4)  dy  the  complementarity  of  A(t,j)  and  A(j,t)  for 

i*j;  in  (6)  we  divide  by  2  to  account  for  the  fact  that,  for  i*j,  B(i,j)  and 
B(j,i)  have  in  common  those  T[  for  which  P(T\|e(t,i))=P(n|  e(t,j)).  □ 

The  following  sections  of  this  chapter  are  devoted  to  finding  outer 
bounds  to  the  achievable  rate  region  of  sequential  decoding  (to  be  exact, 
of  the  stack  algorithm  with  the  particular  tie-breaking  rule  described 
above)  in  various  situations.  These  bounds  are  based  on  the  fact  that,  if 
X(K,e(t))  grows  without  bound  as  t  increases,  then  by  Lemma  3.1.1,  the 
average  complexity  of  sequential  decoding  must,  too,  be  unbounded. 


3.2.  The  Cut-off  Rats  of  Single-User  Channels 


The  main  result  of  this  section  is  the  proof  that  R0(K)  is  the  cut-off 
rate  of  sequential  decoding  for  any  single-user  discrete  memoryless 
channel  (DMC)  K.  This  proof  relies  heavily  on  certain  results  about 
sphere-packing  lower  bounds  to  the  probability  of  decoding  error  for 
block  codes,  which  we  review  in  the  following  subsection. 

3.2.1.  Sphere-Packing  Lower  Bounds 

Probabilities  of  Error 

Let  K=(P;X;Y)  be  a  DMC  and  let  f  be  a  block  code  for  this  channel  with 
rate  R,  block  length  N,  and  number  of  codewords  M  (M=eNR).  Denote  the 
codewords  of  f  by  f(l),...,f(M).  Let  d=(Yt,...,YM)  be  a  decoder  for  f.  Here, 

Y1,...,Ym  are  disjoint  sets  whose  union  is  Y^,  and  the  decoder  decides  in 
favor  of  message  i  if  the  received  word  belongs  to  Yj. 

P(Yj*|  f(i))  is  then  the  probability  of  decoding  error  for  message  i. 

The  average  probability  of  decoding  error  is  defined  as 

M 

Pg(K,f,d)  =  (1/M)  2  r(Yt*|f(i)). 
i=l 

The  maximum  probability  of  decoding  sr-or  is  defined  3S 

pe,max(K’f’d)  =  max  p^'|f(i)). 
i  <i<n 

Pe(K,M,N)  is  defined  as  the  minimum  of  Pe(K,f,d)  over  all  codes  f  with  M 
codewords  and  block  length  N,  and  all  decoders  d. 

We  shall  give  lower  bounds  to  Pe(K,f,d)  and  P0  max(K,f,d);  but  first  more 
definitions  are  needed. 


A  p.d.  Q  on  X  is  said  to  be  the  composition  of  $sXN  iff,  for  each  £*X, 
NQ($)  equals  the  number  of  times  £  appears  in  £.  A  p.d.  Q  on  X  is  said  to 
be  a  composition  class  on  iff  NQ(£)  is  integer-valued  for  each  £eX.  a 
code  is  called  a  fixed-composition  code  iff  all  of  its  codewords  have  the 
same  composition. 

For  any  channel  K=(P;X;Y),  any  positive  real  number  R,  and  any  p.d.  Q  on 
X,  the  sohere-packino  exponent.  Esp(K,R,Q),  is  defined  as 

Esp(K,R,Q)  =  min  D(V|P|Q) 

V 

subject  to  V(i\|  0  i  0  for  each  and  t\£Y, 

2  V(-t\  |  $)  s  1  for  each  $$X,  and  R>  I(Q;V). 

T\6Y 

Here,  D(V|p|Q)  =  £  2  QW I 1 0  l  P(fl 1 0}  and 
&X  t\€Y 

I(Q;V)  =  2  2  Q(S)V(t\  1 In  {V(t\|$)  /  2  Q(OV(i\|C)} . 

UX  t\€Y  CtX 

Lemma  3.2.1.  Sphere-Packing  Lower  Bound  for  r^sd-Comoosltion  Codes 
Let  K=(P;X;Y)  be  a  channel,  N  be  a  positive  integer,  Q  be  a  composition 
class  on  XN,  R  and  5  be  positive  real  numbers.  Let  f  be  a 
fixed-composition  code  with  composition  Q,  block  length  N,  and  number 
of  codewords  M.  Suppose  that  MiexpNfR+S).  Let  d  be  a  decoder  for  f. 
Then,  for  any  such  K,  f,  and  d, 

pe,max(K’f *d)  >  ( 1  /2)  9XP  'N{  Esp(K>R>^  ( 1  +s) } 
provided  that  N > N0(S,  |  X  | ,  |  Y  | ),  for  some  function  N0.  □ 


This  is  Theorem  5.3  in  [16],  and  hence,  its  proof  will  be  omitted  here. 


The  explicit  form  of  the  function  N0  is  not  important  for  our  purposes  (it 
can  be  found  in  [1 61);  what  is  important  is  the  fact  that  N0  does  not 
depend  on  Q. 

Corollary  3.2.1. 

For  any  K,  N,  Q,  R,  8,  f,  M,  and  d  as  in  Lemma  3.2.1,  satisfying  the 
additional  condition  (M-D/2  >  expN(R+5), 

Pe(K,f  ,d)  >  ( 1/4)  exp  -N{  Esp(K,R,Q)(  t  +5)}, 
provided  that  N  >  N0(8,  |  X  | ,  |  Y  | ). 

Proof.  We  make  use  of  an  idea  of  1171  (Eq.  4.41):  if  (i/N)ln[(M-l)/21>R+$ 
and  N  >  N0(S,  |  X  | ,  |  Y  | ),  then,  by  Lemma  3.2.1,  at  least  half  of  the 
codewords  of  f  have  probability  of  error  greater  than 

( 1  /2)  exp  -N{  ESp(K,R,Q)  ( 1  *8) }; 

the  corollary  follows  by  noting  that  such  codewords  have  probability  of 
occurrence  of  at  least  one  half.  □ 


Lemma  3.2.2.  Some  Properties  of  ESD(K,R,Q) 

For  fixed  K=(P;X;Y)  and  Q,  Esa(K,R,Q)  is  a  convex,  non-increasing  function 
of  R>0.  ESp(K,R,Q)  is  positive  for  0<R<I(Q;P)  and  zero  for  R>1(G;P).  There 
is  a  rate  Rr(K,Q),  called  the  critical  rats  for  Q,  which  has  the  property 


RC(K,Q)  +  ESp(K,Rc(K,Q),Q)  =  E0(K,Q),  where,  By  definition, 


E0(K,Q)  =  min  D(V|p|Q)  ♦  I(Q;V) 
s.t.  V(t\  1 0  1 0  for  all  £eX  and  T\eY, 


2v(n|«)  =  1  for  all  $eX. 
T\€Y 


80 


The  assertions  of  this  lemma  are  contained  in  Lemma  5.4  and  Corollary 
5.4  of  [161;  hence,  their  proofs  are  omitted  here. 

Lemma  3.2.3.  For  any  K  and  Q,  R0(K)>E0(K,Q)>R0(K,Q). 

Proof.  We  follow  the  hints  given  in  problem  5.23  of  [16].  The  dependence 
of  the  functions  on  K  will  be  suppressed  in  the  following  proof.  First  it 
will  be  shown  that  R0iE0(Q). 

E0(Q)=  min  D(V  |  P  |  Q)  +  I(Q;V)  (1) 

=  min  ]>  2Q(OV(^||){ln{V(7i|0/P(^U)}  +  ln{V(Ti|tVU(Ti)}},  (2) 

V,U  $eX  t\£Y 

where  U  is  a  probability  distribution  on  Y.  (2)  follows  from  (1)  by  noting 
that 

I(Q;V)  =  min  J  2  Q(S)V(t\  1 S)ln{V(T\  |  ^)/U(t\)},  (3) 

U  T\cY 

which  can  be  proved  by  considering  the  difference  of  the  two  sides  in  (3) 
for  fixed  U,  and  then  using  Jensen's  inequality. 

Now,  note  that 

2  lQ(£)  y(t\  !  S){ln{v(7i  1 1 0) ♦ln{V(7i  |  S)/U(n)}} 

&X  T\eY 

=  -2  2  Q(5)  2  Y(ti 1 1) m{v'r(Ti I OU(T\)/V(n I o}  w 

UX  ^eY 

>-2  2  Q(*)1n{  2  Y(i\j  *)[/P(t\  |  OU(t[)/V(ti  |  O] }  (5) 

£eX  TieY 

=  -2  2  Q(S)  m{  2  v,P('nlou(*T\)},  (6) 

S«X  7|€Y 


v 


whera  (55  follows  from  (4)  by  Jensen's  inequality;  and  equality  nolds  in 

(5)  if  V  is  as  follows. 

</P<H|S)U(i\) 

V(i 1 1  $)= -  — 

2  /p(i\|ou(i\) 

1\€Y 

From  (l)-(6),  it  follows  that 

E0(Q)  *  min  -2  2  m {  2  v'PCnlOTn)}  .  (7) 

U  $eX  TieY 

So,  for  any  p.d.  U  on  Y, 


Eg(Q)  <  -2  2  Q(?)  In{  2  /P(n|«)U(H)}. 
$eX  T\eY 

In  particular,  we  may  take  U  in  (8)  to  De 

{  2  Q*U)/^|S)}2 


U*(t\)  = 


2  { 1  q*(0/p(^U)}2 


for  each  t^Y, 


T\€Y  £sX 

where  Q*  is  a  p.d.  that  maximizes  R0(Q),  i.e.,  R0(Q*)=Ro-  3y  Theorem 
5.6.5  of  [12],  Q*  has  the  property  that 

2 -/Philo  2Q*«)7PhiT0  2  2  ( 2  Q'Ki/pcnTo}2  <» 

T\€Y  CfiX  T\€Y  UX 

for  each  $*X,  with  equality  if  Q*(O>0. 


Substituting  U*  into  (8),  we  get 


E0(Q)  i  -2  2  Q($)  1n{  2  /RrH^IFCn)} 
$eX  i\eY 


2  /pof]!)  2  Q*(C)^penIo 


y  2  { 2  Q#(Ov^iTo} 

1\eY  UX 


i  R0. 


(11)  follows  by  the  property  of  Q*  expressed  in  (9).  This  completes  the 
proof  of  the  first  half  of  the  lemma.  We  now  prove  that  E0(Q)>R0(Q)  for 
all  Q. 

E„(Q)  =  min  -2  2  Q<©  ln{  2  /P(i\| C)U(t\)}  (i2) 

U  $«X  t\€Y 


min  -2  ln{  2  Q<$)  2  VP(t\  |  OU(t\)} 


U  $sX  i^Y 

=  R0(Q),  04) 

where  (12)  is  Just  a  restatement  of  (7);  (13)  follows  by  Jensen's 
inequality;  and  (14)  follows  by  substituting  the  minimizing  U,  which  is 


{ 2  Q(i)/?CnTT}: 


U(-q)  = 


2 


for  each  i\eY.  □ 


Corollary  3.2.2.  max  E00<,Q)  =  R0(!O  for  a!!  K. 

Q 

Proof.  By  Lemma  3.2.3, 

R0(K,Q)<E0(K,Q)iR0(K); 

hence, 

max  R(j(K,Q)s  max  E<j(K,Q)<  R0(K). 

Q  Q 

The  proof  follows  by  noting  that  maxR0(K,Q)=R0(IO. 

Q 

Corollary  3.2.3.  max  RC(K,Q)  <  Ra(K)  for  all  K. 

Q 

Proof.  RC(K,Q)<E0(K,Q)  by  Lemma  3.2.2,  and  E0(K,Q)<R0(K)  by  Lemma 
3.2.3.  Hence,  RC(K,Q)<_R0(K)  for  all  K  and  Q. 

3.2.2.  A  Lower  Bound  on  X(K,f) 

Lemma  3.2.4.  For  any  K=(P;X;Y),  any  code  f  for  K  with  M  codewords  and 
block  length  N,  and  any  collection  of  integers  t,M1f...,Mt  such  that  i)  t>  1, 

ii)  Mj>1  for  each  ie{1,...,t},  and  iii)  M-1«2<M^-1),  one  has 

!<i<t 

X(K,f)  >  Pa(K,M1,N)+— +P8(K,nt,N). 

Proof.  Fix  K,  f,  and  M i Let  f(1),...,f(M)  be  the  codewords  of  f. 
Define 

f  {'H6YN:P(i\|f(i))<P(*n|f(j))}  if  i#j; 

[  ♦  i*j. 

For  each  i€{l,...,n},  define 

Pt*  {(S, . st)  :  S1U—USt={1,...,M}»  i€Sj,  |  Sj  |  =f1j,  j=l,...,t  }. 


It  follows  from  the  definition  that,  if  then  the  sets 

are  mutually  disjoint,  except  for  i,  which  is  common  to  all. 

For  each  subset  T  of  {1,...,M},  define 
(T)={ hi Ynj  There  exists  jiT  such  that  jsi  and  P(n|f(i))<P(n|f(j))  }• 

Observe  that,  for  any  S=(S1f...,St)iPi  and  any  ii{l,...,M}, 

M  t 

2  P(B<i,j)|f<i))  2  2  P<Et(Sk)  |  fCi))- 

j=1  k=1 

So,  for  any  p.d.  Wj  on  P}, 

M  t 

2  P(B(i,j)  |  f(i))  2  2  Wt(S)  2  P(tj(Sk)|f(i)). 

j=i  StPj  k=i 

Take  Wj  as  the  uniform  distribution  on  Pj  for  each  i€{1,...,M}.  Note  that 
the  cardinality  of  Pj  equals  c  :=  (M- 1  1  )i  . 

Sum  over  all  i  to  obtain 

MM  M  t 

X(K,C)  =  (1/M)  2  2  F(S(i,j)|f(i))  2  (1/cM)  2  12  P(E|(Sk)|f(i)). 

i=1  j=1  i=l  SiPj  k=l 

M 

Let  «k=  (1/cM)  2  2  P(Bj(5k)|f(i)). 

1=1  5iPj 

Now,  X(K,C)  > «,+•••♦«(.  Clearly,  the  proof  will  be  complete  if  we  show 
that  «k2P0(K,Mk,N). 

Define  F(m)={D:D  is  a  subset  of  {i,...,M}  with  m  elements}  and 
Fi(m)  =  {D:D€F(m)  and  ieO>. 


n 

ccks  (1/ctt)  £  2  pce^^)  |  f o» 
i=1  ScPj 
H 

=  (1/CM)  l  2  2  P(Et(Sk)|f(i» 

i=1  D€F^Mk)  S*Pj:Sk-D 
M 

*(1/cM)2  2  P(Et(D)|f(D)  2  1 

i=l  D€Fi(Mk5  S«P1:Sk=0 

n  (n-nk)i  (Mk-0! 

S  (1/cM)  2  2  P<E|(0)  I  f«0)  - 

i=1  D€FjOlk)  (Mrl)!-(Mt»D! 

(M-Mk)!  (Mk- 1 ) !  M 

=  -  2  l  P(Et(D)|f(0) 

Ml  i=1  D€Fi(Mk) 

(n-Mk)!  (Mk-1)! 

=  -  2  2  P(Ei(0)|f(i» 

M!  D€Fi«k)  ieD 

(M-Mk) !  OVD! 

>  — — — — 

m 

=  P8(K,nk,N).  □ 

Corollary  3.2.4.  For  any  channel  K=(P;X;Y),  any  code  f  for  K  with 
block  length  N  and  number  of  codewords  M,  and  any  integer  H  such  that 
m2H,  one  has  X(K,f)>0l/2H)Pe(K,H,N). 


2  MkPe(K,Mk,N) 
D€F(Mk) 


Proof.  Under  the  conditions  of  the  corollary,  integers  can  be 

found  such  that  t>(M/2H)  and  M^H,  for  each  i.  The  result  follows  from 
Lemma  3.2.4  by  noting  that  Pg(Kfm1,N)>Pe(K,m2,N)  for  any  pair  of 
Integers  mt  and  m2  such  that  m1>m2. 

3.2.3.  Proof  that  R„  is  the  Cut-off  Rato 

Lemma  3.2.5.  Let  ft,f2,..  be  an  infinite  sequence  of  block  codes  for  a 
OMC  K=(P;X;Y).  Let  N^kl  be  the  block  length  of  fj  for  each  i,  where  k  is 

some  fixed  Integer.  Let  be  the  number  of  codewords  in  'j.  Suppose 

that  Mj>expNj(Rg+£)  for  each  i,  where  t  is  a  positive  constant 

independent  of  1.  Then,  for  all  sufficiently  large  i,  (t/Ni)lnX(K,fi)>6/8. 

Proof.  Let  g,  be  a  subset  of  fj  with  a  fixed  composition  and  with  number 
of  codewords  at  least  as  large  as  Mj/(1+Nj)l*l .  (There  is  no  problem  in 
assuming  that  g1  has  this  many  codewords  because  (1+Ni)l*l  is  an 
upper  bound  on  the  number  of  composition  classes  on  XnO  Let  Lj  be  the 
number  of  codewords  in  gt,  and  let  Qt  be  the  composition  of  the 
codewords  in  g^. 

Note  that  X(K,f1)2(Lj/f11)X(Ktgi),  a  fact  that  will  be  used  later  in  this 
proof. 

Let  S=€/(8+4R00O). 

It  is  tedious  but  conceptually  straightforward  to  see  that  there  is  a 
function  Q(e,K, |x|,| Y | )  such  that  for  all  i>ft  all  of  the  following 
conditions  hold  simultaneously. 

1.  (1/Nt)lnLj>R0(Kb€/2 

2.  (1/N1)1n(Lj/8M1)>-€/8 

3.  Nj>N0($,  |  X  | ,  |  Y  | ) 


(15) 

(16) 
(1?) 
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There  exist  integers  Hi  such  that 

a)  L|>2Ht 

(18) 

b)  Rc(K,Qt)+6<(1/Nt)1n[(Hr1)/2] 

(19) 

C)  Rc(K,Qi)+2S  >  (l/NpinHj. 

(20) 

The  N0  In  (17)  is  the  same  as  the  N0  in  Lemma  3.3.1. 

To  see  that  (15)  and  (16)  can  be  satisfied,  recall  the  assumption  on  the 
size  of  Lj.  To  see  how  (18M20)  can  be  satisfied,  first  note  that,  for 

large  1,  the  right  hand  sides  of  (19)  and  (20)  are  almost  identical;  thus, 
(19)  and  (20)  essentially  require  that  <1/Nj)1nHj  be  between  R^K.Qp+S 

and  Rc(K,Qj)+25,  a  condition  which  can  clearly  be  satisfied.  Now,  if 

are  chosen  to  satisfy  (16)  and  Hj  are  chosen  to  satisfy  (19)  and  (20), 

then,  for  all  i  sufficiently  large,  they  also  satisfy  (18)  in  view  of  1) 
Rc(K,Q)iR0(K)  (see  Corollary  3.2.3)  and  2)  the  relation  S=e/(8+4R„(K)). 


Hereafter,  suppose  that  i  is  larger  than  8.  Let  Hj  be  chosen  so  that 

(18H20)  are  satisfied.  The  rest  of  the  proof  is  a  simple  consequence  of 
the  results  established  thus  far. 


X(K,ft)  >  (Li/Mi)X(K,gi) 

>  (Li2/(2MiHi))Pe(K,Hi,N1) 


(true  in  aeneral) 


(by  (18)  and  Corollary  3.2.4) 


>  (L12/(8M1Ht))exp-N1{Esp(K,Rc(K,Qt),Qt)(1^)}  (by  (17),  (19), 

and  Corollary  3.2.1) 

=  (LjVOMjHj)) exp-Nj{(  1  ♦S)lE0(K,Q1)-Rc(K,Qj)]}  (by  Lemma  3.2.2) 
2  (Lj2/(8MjHj)) exp-N|{(  1  +$) (R0(K)-RC(K,Q|)1}  (by  Corollary  3.2.2) 


8! 


(Lj/SMj) exp  Ni{R0(K)*€/2-Rc(K,Qj)-2S-( !  +8} R0(KM  1  +S)Rc(K,Qj)} 


(by  (15)  and  (20)) 

=  (L^Snpexp  Ni{6/2-25-SR0(K)+8Rc(KlQi)} 

2  (L^eiipexp  Nt(e/2-28-8Ro(lO}  (since  Rc(K,Qj)>0) 

=  CL{/8M1)exp  tye/4  (since  8=  €/{8+4P0(K)}) 

>  exp  N^/8  (by  (16)).  □ 

Theorem  3.2.1.  R0(iO  Is  the  cut-off  rate  of  sequential  decoding  for  any 
single-user  DMC  K. 

Proof.  For  any  single-user  DMC  K  and  any  (M,k)  tree  code  e  for  this 
channel,  if  (1/k)inM>Rfl(iO,  then  Lemma  3.2.5  implies  that  X(K,e(t)) 
Increases  exponentially  in  Increasing  t.  Hence,  by  Lemma  3.1.1, 
A(K,e,r,t),  too,  increases  exponentially  in  t  regardless  of  what  the 
metric  V  is.  It  follows  that  rates  above  R0(K)  are  not  achievable  (in  the 
sense  of  Def .  1 .4.2).  □ 


3.3  Proof  that  R0=R  for  Pairwise  Reversible  Channels 


A  channel  K=(P;X„...,Xn;Y)  is  said  to  be  a  pairwise  reversible  channel 
(PRC)  Iff  for  each  1=1,.., n,  and  i\*Y, 

2  VP(1\  i  c  i  >— Pt'n.  1 5  !»«•»  tn)1og(P(n|S  !>•••»  ^n)/p(i\|c1,...,i:n))=o. 

T\€Y 

(Here,  0 log  0=0.) 

PRC's  were  introduced  by  Shannon,  Gallager,  and  Berlekamp  in  their  study 
of  zero-rate  error  exponents  for  block  codes  [171.  Some  examples  of 
PRC's  are  the  two-user  £R  and  erasure  channels  of  §1.5.  Our  purpose  in 
this  section  is  to  prove  the  following  result. 

Theorem  3.3.1.  R0(K)=R(K)  for  any  PRC  K. 

Recall  that  R0(K)  has  already  been  shown  to  be  an  inner  bound  to  R(K)  for 
all  K  (Theorem  2.2.1).  Thus,  to  prove  that  R0(K)=R(K)  for  a  given  K,  it 
suffices  to  show  that  R0£K)  is  an  outer  bound  to  ROO.  The  following 
result,  taken  from  [171  without  proof,  is  the  key  to  proving  this. 

Lemma  3.3.1.  For  any  PRC  K=(P;Xt,...,Xn;Y),  any  positive  integer  N,  and 
any  pair  of  $€(Xtx...xXn)N  and  fc(XiX...xxn)N, 

2  min{P(n  |  $),P(n  |  $)}  >  g(N)  2 
1\6YN  *H€Yn 

where  g(N)  =(l/4)exp{/2N  1nPmjn)  and 

Pmln=  min{P(n|$):T\€Y,  ££(X,x***xXn),  and  P(ti|S>0}. 

(P^  ts  thus  tfl8  sliest  non-zero  transition  probaoility  over  K.)  □ 

Definition  3.4.1.  Let  f  be  an  (M,N)  block  code  over  a  symbol  alphabet 
X.  A  p.d.  Q  on  xN  Is  said  to  be  the  composition  of  f  iff,  for  each  $€XN, 
MQ(£)  equals  the  number  of  times  $  appears  as  3  codeword  of  f. 


(The  concept  of  composition  here  has  no  relation  to  that  in  the  previous 

section.) 


Lemma  3.3.2.  For  any  PRC  K,  and  any  block  code  f  for  K, 

X(K,f)  >(  I  /2)g(N)  {- 1  *  exp  -N«(K,M,Q)], 

where  N  is  the  block  length  of  f;  g(N)  is  as  in  Lemma  3.3.1;  M  is  the 
parameter  of  f  (if,  say,  f  Is  an  n-user  code,  then  n  is  of  the  form 
(Mf,.~,Mn,N)  where  Mj  is  the  number  of  codewords  in  the  ith  component 

code  and  N  is  the  common  block  length);  Q  Is  the  composition  of  f.  The 
function  5(K,M,Q),  as  defined  in  §2.2,  is  the  minimum  of  R0(K,Q,T)-R(T) 
over  all  T,  where  T  is  a  non-empty  subset  of  ,n}  and  R(T)  is  the  sum, 
over  ieT,  of  (1/N)lnMj. 

The  proof  of  Lemma  3.3.2  will  be  given  following  that  of  Theorem  3.3.1. 
Proof  of  Theorem  3.3.1. 

Let  K  be  a  PRC,  and  f  be  a  tree  code  for  K  with  parameter  M=CM1,...,Mn,k), 
where  n  denotes  the  number  of  users.  Let  Rs(R1(™,Rn)  with  Rjs(i/i01nMj. 

Suppose  that  R  does  not  belong  to  R0(K).  we  will  show  that  R  dees  not 
belong  to  R(K),  either. 

Let  f(i)  be  the  block  code  obtained  by  truncating  f  at  level  i  and  let  Q| 

be  the  composition  of  f(i).  The  parameter  of  f(i),  which  is  denoted  by  M1, 
equals  (Mj1,...,Mni,ki).  The  rate  of  f(i)  thus  equals  R=(Rt,...,Rn).  Now,  by 

Lemma  3.3.2, 

X(K,f(1))>(1/2)g(ki)l-t  ♦  exp-ki  SflcMo1)] 

>  ( 1  /2)g(k1 ) ( - 1  +exp-k t  A(K,M)  1, 
where,  by  definition,  A(K,f1)=sup{5(K,M’,Qp:i=1,2,3,...}. 


Sines  we  assume  that  R  does  not  belong  to  R0(K),  we  have  A(K,tt)<0. 
Therefore,  X(K,f(D)  Increases  exponentially  as  1  Increases.  This  in  turn 
Implies,  by  Lemma  3.1.1,  that  A(K,f,r,1)  Increases  exponentially  as  1 
increases,  regardless  of  what  the  metric  Is.  This  means  that  the 
expected  number  of  nodes  which  reach  the  stack-top  before  the  correct 
node  at  level  i  grows  exponentially  in  increasing  i.  Hence,  R  does  not 
belong  to  R(K).  □ 

Proof  of  Lemma  3.3.2. 

Let  K=(P;X,,...,Xn;Y)  be  a  PRC  and  f  be  a  11=01, . Mn,N)  block  code  for  K. 

Let  fj  be  the  component  code  of  f  for  user  i,  i=1,...,n.  Let  the  codewords 
of  tj  be  indexed  by  integers  1  through  hj,  and  the  codewords  of  f  by 
n-tuples  of  integers  (m„...,mn)  where  mj€{1,...,Mj}.  The  words  index  and 
message  will  be  used  interchangeably  in  what  follows. 

The  codeword  in  f  with  index  (m,,...,!^)  corresponds  to  a  collection  of 
codewords,  namely,  codeword  m^  from  code  fi  for  each  ie{  1  ,...,n}.  The 
codeword  with  index  m=(m1f...,mn)  will  be  denoted  by  f(m),  as  usual. 


Recall  that 

\(K,f)  =  (1/H)  2  2P(S(m,m|f(m))  (1) 

m  fn 

where  H=f1, ---Mn  is  the  total  number  of  codewords,  the  summations  run 
through  all  possible  messages  for  f,  and  B(m,m)  is  as  defined  in  §3.1. 


Now,  by  Lemma  3.3.1,  for  any  distinct  pair  of  m  and  m, 
P(B(m,m)|f(m))*P(B(m,m)|f(m))  >  (g(N)/2)  2/P(i\  j  f(m))P( |  f(m)), 


where  the  factor  of  1/2  accounts  for  the  fact  that  B(m,m)  and  B(m,m) 
have  in  common  those  for  which  P(  *q  |  f(m))=P(  -q  |  f(m)). 


Summing  over  all  messages, 

2  2  p<Bm,m  I  f<m»  *  <gW2)  2  2  2/p(ii|f(m))p(i\|f(m)) 

m  m  m  m=m  t\ 

=  (g(N)/2)  {-H  +  2  2  2  /P(n|f(m))P(n|f(m)).  (2) 

m  m  ti 

This  expression  will  now  be  simplified. 

Let  Q  be  the  composition  of  f,  and  Qj  be  that  of  fj.  The  relationship 

between  Q  and  Qt . Qn  is  a  simple  one:  For  any  collection  of 

i€{  1 , — ,n>,  Q(*1x*--**n)=Qi($i)-*-Qn(Sn). 

The  following  short-hand  notation  (which  should  be  familiar  by  now)  will 
be  used  in  the  rest  of  the  proof.  For  any  subset  T  of  { 1  ,...,n}  and  any 
£ix,,,x£n,  where  will  denote  the  collection  of  for  ieT;  $ 

will  denote  $ix***xSn»  Q($y)  will  denote  the  product  of  Qj(£j)  over  all 
ieT.  P(n  |  $)  and  P(T\|$y,^yc)  will  be  used  interchangeably. 

For  any  message  m=(m1,...,mn)  and  any  subset  T  of  {l,...,n},  Tm(T)  will 
denote  the  set  of  messages  m=(m1,...,mn)  for  which  m^rru  for  each  ii?5. 

Now,  we  can  proceed  with  the  proof. 

2  2  2  /P(‘n|f(m))P(n|f(m)) 

m  m  n 

>  Y  2  2  /P(‘fl|f(m))P(n|f(m)) 

m  meTm(T)  T[ 


Here,  T  is  a  fixed  but  arbitrary  subset  of  ,n}. 
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=  2  2  Qtty  2  /PCilU(m))P(ii|  CT,f(m)r) 

m  {j  T[ 

The  summation  over  £y  should  be  thought  of  as  one  summation  for  each 

element  in  T;  the  summation  corresponding  to  an  element  i  of  T  runs 
through  all  of  XjN.  Now,  let  M(T)  denote  the  product  of  all  for  ieT. 

=  h  2  Q«)ncn  2  Q«t!  2  -/pc-h  |  jcT>«-r>) 

i  <t  n 

Recall  that  H  is  the  total  number  of  codewords  in  f.  The  summation  over 
£  runs  through  all  of  (X^—xx^N. 

»  hmct)  2  Q «t«)  2  Q(«t>2  <Wt)  2  /p(nU)P(n!  Mr=) 
tp  tl  !t  1 

Note  that  <Kt)«Q«r)QttT). 

=  HMCT)  J  Q(«r)  2  2  2  0(i7)/P(i\|CT.«7«) 

=  hmct)  2  Q(ir)  2  {2 

$7«  ^  ^T 

=  Kexp  M(R(T)-R0(K,Q,T)),  where  R(T)  is  defined  as  ( 1  /N)lnM(T). 


We  have  thus  proved  that,  for  any  non-empty  subset  T  of  { 1  ,...,n}, 

2  l  2  /P(t\  |  f(m))  P(i\  |  f(m))  >  H  exp  N(R(T)-R0(K,Q,T)).  (3) 

m  m  n 


i  ♦k  '•*  •* .  •-» v*  ».  i 


It  follows  that 

2  J  2  v'P(n|«m))P(n|f(m))  i  H  exp N(max{R(T)-Rt(K AT)}).  (4) 

m  m  n  T 

Noting  that  max{R(T)-Ro(K,Q,T)l}s-6(K,MfQ), 

2  2  2  /P(  11 1  f (m))  P<  T[  j  f (m))  2  H exp  N(maxlR(T)-R0(K,Q,T)l).  (5) 

m  m  n  T 

Now,  the  lemma  follows  from  (1),  (2),  and  (5).  □ 


•Vi'.  ’-  •  •  -s 

.  *-  V'*  ’  4*  V*  » 


3.4.  A  Luwtr  Bound  to  tho  Ensemble  Average  of  Computation  in 
Sequential  Decoding 

Theorem  3.4.1.  For  any  channel  K=(P;X1,...,Xn;Y),  any  tree  code 

eneem&le  E*Ens(HlM^11n;k^<1,^n;Q, . Qn).  any  metric  r  that  can  be 

used  In  sequential  decoding  of  codes  In  Ef  and  any  positive  integer  t, 

.  EA(K,eJ,t)  i  h(t)exp-kt5(K,M,Q), 

where  E  denotes  expectation  (here,  E  Is  an  averaging  operation  over  all 
codes  e  In  E);  h(t)*(g/</t)«-o<l//t)  where  g  Is  a  constant  and  o(1//t)  is 
a  quantity  which  goes  to  zero  faster  than  1//t  as  t  goes  to  Infinity; 
_ ,Mn^c>5  Q>(Q1f».,Qn);  and,  5<K,M,Q)  is  as  defined  in  82.2. 

Remarks 

1)  There  are  certain  similarities  between  Theorem  3.4.1  and  the  results 
of  82.4,  but  neither  Is  stronger  than  the  other. 

Theorem  2.4.1  and  Lemma  2.4.1  together  imply  that,  for  branchwise 
additive  metrics,  the  method  of  S2.1  cannot  be  used  to  prove  the 
achlevability  of  any  point  outside  R0.  The  result  here  is  much  stronger. 
Theorem  3A !  states  that  the  inability  to  prove  achievability  outside  R0 
is  not  due  to  a  shortcoming  of  the  particular  method  employed  in  82.1, 
neither  is  it  due  to  the  restriction  of  the  metrics  to  branchwise  additive 
ones.  It  is  because  random-coding  arguments  over  the  class  of  ensembles 
we  are  considering  in  this  thesis  can  not  yield  any  achievable  points 
outside  R0;  in  this  respect,  the  method  of  82. 1  can  not  be  improved. 

Theorem  2.4.1  gives  an  outer  bound  to  what  can  be  shown  to  be 
achievable  by  a  given  branchwise  additive  metric  by  using  the  method  of 
82. 1.  Theorem  3.4.1,  on  the  other  hand,  implicitly  deals  only  with  the 
best  possible  metric. 

2)  In  the  one-user  case,  a  result  similar  to  Theorem  3.4.1  was  proved  by 
Gallager  in  a  different  context  [181. 


Proof  of  Theorem  3.4.1. 

In  view  of  Lemma  3.1.1,  it  is  sufficient  to  prove  that 

EX(K,s(t))  2  h(t) exp -kt $(K,M,Q). 

Here,  e<t)  is  the  block  code  obtained  by  truncating  the  tree  code  e  at 
level  t,  as  defined  in  §3.1.  We  associate  messages  for  e(t)  with  level-t 
nodes  In  e.  Now,  by  definition, 

X(K,e(t))  =  ( 1/M(t))  2  2  P(B(u(..t),u(..t))  |  eu(..t)),  ( 1 ) 

u(..t)  u(..t) 

where  lift)  is  the  total  number  of  codewords  in  e(t),  i.e.,  MftMMf'Mn)*; 

the  sums  are  over  all  level-t  nodes  in  e;  eu(~t)  denotes  the  codeword  in 
e(t)  for  message  (node)  u(..t);  and  B(u(..t),u(..t))  is  defined  as  follows. 

f{H*Ykt:P(ii  |  eu(..t)  >  P(i\  |  eu(..t)) }  u(..t)*u(..t); 

B(u(„t),u(..t))  s 

+  u(..t)=u(..t). 

» 

Taking  expectations  of  both  sides  of  (1), 

E  X(K,e(t))  *  ( 1  /M(t))  2  m  2  E  P(B(u(..t),u(..t))  |  eu(..t)),  (2) 

u(..t)  u(..t) 

EX(K,e(t))  can  thus  be  lower-bounded  by  lower-bounding 

E?(=(u(..t),u(..t))|eu(..t)), 

which  is  just  the  probability  of  the  event  that 
t 

2  1n(P(y(i)  |  eu(i))/P(y(i)  |  eu(i))l  >  0.  <V 

i=  1 

Here,  y(i)  denotes  the  ith  channel  output  block,  and  it  is  regarded  as  a 
random  variable  taking  values  in  Yk.  Likewise,  e  is  regarded  as  a  random 
variable  taking  values  in  E. 
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The  distribution  of  Zj  =  ln[P(y(i)|eu(i))/  P(y(i)|eu(i))]  depends  on  the 

type  of  uLi)  with  respect  to  u(..i).  In  order  to  simplify  matters,  let  us 
suppose  that  the  type  of  u(..t)  with  respect  to  u(..t)  is  (T,...,T)  for  some 
non-empty  subset  T  of  0,...,n}.  Z Zt  are  then  independent-  and 

Identically-distributed.  So,  the  probability  of  the  event  in  (3),  which  is 
now  Just  the  probability  that  the  sum  of  t  iid  random  variables,  Z,,...,^, 

have  a  non-negative  sum,  can  be  lower-bounded  by  using  the  asymptotic 
form  of  the  Chemoff  bound,  as  given  by  equations  5.4.23  and  5.4.24  of 
1121.  To  use  the  Chemoff  bound,  we  note  that  the  moment-generating 
function  of  Zu  E(expsZ,)t  is  as  follows. 


ECexpsZf)  =  2  2  2  2  Q<<T)Q«t)q( Vp(ll  I W 1  _S  I 

H  $t°  $T 

where  we  have  used  the  notation  of  $3.3. 


It  can  be  verified  easily  that  E(expsZt)  is  a  convex  function  of  s  with  a 
minimum  at  s=1/2;  thus,  the  minimum  value  of  E(expsZt)  equals 

E{exp(Zj/2))  =  8xp-kR0(K,Q,T). 

Now,  the  Chemoff  bound  states  that 


Pr{Zt+»-+Zt>OhK(t)exp-tfcfi0(K,Q,T), 


/4\ 


where  H(t)  is  of  the  form  (<x//t)  ♦  o(l//t)  for  some  constant  <x.  (For 
the  exact  form  of  H(t),  see  page  130  of  It 21.) 

Note  that  exp{k(t-!)R(T)},  where  R(T)=(1/k)lnM(T),  lower-bounds  the 
number  of  level-t  nodes  which  are  of  type  (T,...,T)  with  respect  to  (3ny 
given)  node  u(..t).  Thus,  it  follows  from  (2),  (3),  and  (4)  that 

EX(K,e(t))  >  H(t) exp{-kR(T)} exp  { kt{R(T) - R0(K,Q ,T)1 },  (5) 

which  is  true  for  any  non-empty  subset  T  of  {i,...,n}. 


Now,  lower-bounding  H(t)exp{-kR(T)  by  h(t)=H(t)/(M,—  rt)  and  taking  a  T 

In  (5)  for  which 

RCT)  -  Ro<K,Q.T) = max{R(S)  -  R0(K,Q,S)} 

S:S  is  a  non-empty  subset  of  {t,...,n} 

=  - 

we  obtain  EX(K,e(t))  i  h(t)  exp  { -kt$(K,M,Q),  (6) 


thus  concluding  the  proof. 


Chapter  4 


NON- JOINT  SEQUENTIAL  DECOOING 

The  sequential  decoding  procedure  that  we  have  been  considering  in  the 
past  chapters  -  joint  sequential  decoding  (JSD),  as  it  will  be  called  in 
this  chapter  -  requires  a  complete  knowledge  of  all  tree  codes  in  the 
system  on  the  part  of  a  single  processor.  In  this  section,  we  shall 
consider  what  we  call  non-joint  sequential  decoding  (NJSD)  in  which 
there  is  a  separate  sequential  decoder  for  each  user,  the  decoder  for  any 
given  user  working  only  on  that  user's  tree  code.  (See  Figure  4.1.)  Our 
goal  is  to  examine  the  achievable  rate  region  of  NJSD  (to  be  defined 
presently)  and  compare  it  with  that  of  JSD. 

Consider  a  channel  K=(P;X„...,^,;Y)  and  suppose  that  user  i  employs  a 

(Mj,k)  tree  code  e^,  1=l,...,n.  Let  e  denote  joint  tree  code  for  e1f....,en. 

NJSD  in  this  situation  uses  n  sequential  decoders.  The  sequential  decoder 
working  on  user  i's  tree  code  e,,  which  we  denote  by  SDj,  uses  a  metric 

r{  of  the  form  oo 

:  U  (XihkxYhk) - [-00,00). 

h=! 

Nets  that  the  form  of  r j  does  not  allow  SD;  to  use  any  information  about 
the  codes  of  the  otner  users. 

Rougniy  speaking,  acntevability  in  NJSD  requires  that  the  average 
decoding  complexity  be  finite  for  each  SD^  l=l,..,n.  What  follows  is  a 

formalization  of  this  idea. 


Achievabilitu  in  Non- Joint  Sequential  Decoding 

Let  Cjtj(K,e,rj,s,y)  be  the  number  of  nodes  in  Ij(sp,  the  jtf1  incorrect 
subset  of  the  correct  path  s<  in  8j,  that  reach  the  stack-top  of  SD^ 


00 


Decodes  the  n-user 
tree  code 


Source  1 


Enc.  1 


Channel 


Source  n 


Enc.  n 


Joint  Sequential  Decoding 


Decodes  tree  1 


Source  I 


Source  n 


Non-Joint  Sequential  Decoding 


Figure  4.1.  Joint  and  Non- Joint  Sequential  Decoding 


(As  usual,  sss1X’’‘X$n  denotes  the  correct  path  in  e,  and  y  denctas  the 

channel  output  sequence.) 

Lot  q>j(K,sJ,i)  be  the  value  of  Ct(j(K,e,ri,sly)  averaged  over  s  and  y.  Let 

ou(K,t,ri>*{ci,|(M,ri)— ctfL(K,s,ri)}/L. 

For  large  L,  DjtL(K,eJj)  can  be  interpreted  as  the  average  work  SDt  has 
to  do  to  move  one  step  along  the  correct  path  Sj. 

A  point  Rs(Rf,.»^)  is  said  to  be  achievable  bu  NJSD  if  there  exists  a 

finite  constant  A,  A=A(K,R),  such  that  for  any  given  L  there  exist  i)  a 
code  e  with  rate  at  least  as  large  as  R,  and  ii)  metrics  r1f...,rn  such 

that 

D 1  f  L(K,e,r  ,)+—•♦  On>L(K,e,rn)  <  A. 

The  achievable  rats  region  of  NJSD  is  defined  as  the  closure  of  the  set 
of  all  points  achievable  by  NJSD,  and  is  denoted  by  Rnj(K). 

Theorem  4.1.  RnjOO  Is  Inner-bounded  by  RnjaOO,  which  is  defined  as 
follows. 

5  U  V*-®- 

Q 

where  the  union  is  over  all  Q=(Q1,...,Qn)  such  that  Qj  is  a  p.d.  on  Xjk  for 
some  k  (k  is  the  same  for  each  1),  and  for  any  such  Q, 

Rnj0(K,Q)  =  {(R„...fRn):0<Rt<RnjjO(K,Q,i)  for  each  1=1 . n>, 

where  Rnj/K,Q,i)  =  -(1/k)ln  2  {  I  Q1(ty‘/P1Mll*i>}‘» 

T\€Yk  $j€Xjk 


and  where 


. V-V- 

^i-l  ^i+1  ^n 

Proof.  Wo  uso  a  random-coding  argument  that  is  essentially  the  same  as 
the  one  In  §2.1.  Hence,  details  of  the  following  proof  are  omitted. 

let  E*Ens(M1,..,Mn;k;X|,..,Xn;Q1,..,Qn)  be  an  arbitrary  ensemble  such  that 
R|lj#(K,Q,j)>(1/k)1nMj  for  each  js1,...,n.  To  prove  the  theorem,  it  suffices 
to  prove  that  there  exist  metrics  T1,...,rn  such  that  the  expected  value 
of  Ou(K,o,r1)— OnfL(K,o,rn)  over  E  Is  uniformly  bounded  over  all  L. 
Simpler  yet,  It  suffices  to  prove  that,  for  any  given  i,  there  exists  Tj 
such  that  the  expected  value  of  Dj  L(K,e,rp  over  E  is  uniformly  bounded 

over  all  L.  Without  loss  of  generality,  we  may  consider  the  expected 
value  of  D1(L(K,e,r,)  over  E,  as  we  do  next. 

Let  Ep£ns(Mj ;k;Xj k;Qj ),  ist,...n.  Let  E  denote  expectation  over  E,  and  E, 
denote  expectation  over  Et.  Now, 

ED1>L<K,e,rt)  =  E,-EnDt>L(K,e,r1) 

=  E^Ez— EnDUL(K,e,r,)} 

=  i*Di_C<i,e«,r  j), 

where  0j_  is  as  in  Def.  1.4.1,  and  :<,=(?, jX^.Y*)  with  Pt  as  follows. 

For  each  fc.eX,*  and  neYk, 

. W- 

^n 

If  T,  is  taken  as  met(Kt,l,Qt,B),  with  bias  B={Ro(K,,Q,,{1})* lnMt}/2, 
then  E1DL(K1,8l,rt)  is  uniformly  bounded  over  all  L  by  the  results  of  §2.2. 

Bemads 

1)  The  branch  metric  for  met(K1vl,Q|(B)  is  as  follows. 


For  each  i^Y*,  SsX,*,  _ 

#ile 

*($,n)  =  1n -  ■  -  B. 

2  q,«)-/p^Tw 
tex,k 

2)  R  •  (K,Q)  is  also  achievable  if  SO,  uses  the  following  Fano  metric. 

nj,o  ■ 

For  each  -qeY*,  lixf, 

Py^\V 

#F(i/*l)=1n - lnMj, 

WjCq) 

where  =  2Q|(£)Pj(ii|5).  Q 

t«x,k 

One  might  think  that  R(K)  must  be  at  least  as  large  as  Rnj(K)  for  all  .<. 
This  is  not  true.  There  is  no  general  inclusion  relationship  between  R 
and  Rnj,  as  illustrated  by  the  following  examples. 

Example  4.1.  A  channel  for  which  R  is  not  contained  in  Rnj. 

Consider  a  channel  K=(P;X1lX2;Y1xY2)  (Figure  4.2)  where  X1sX2=Y1sY2= 
{0,1}  and  the  transition  probabilities  are  as  follows. 

P((5,0)|(5,0))  =  1-6  5=0,1; 

P«5,l)|(5,0))=  €  5=0,1; 

P<(5, 1 )  |  (5,  l ))  =  1  -6  5=1  and  5=0,  or  5=0  and  5=  1 ; 

P((5,0)  |  (5,  l ))  =  6  5=1  and  5=0,  or  5=0  and  5=  1 ; 

all  other  transitions  have  zero  probability. 


(0,0) 


(1,1) 


(1,0) 


(0,1) 


(0,0) 


(0,1) 


(1,0) 


(1,1) 


Figure  4.2.  The  two-user  channel  of  Example  4.1. 


Thus,  in  a  sense,  the  input  by  the  second  user  selects  the  channel  for  the 
first  user.  If  (^t,$2)  is  the  channel  input  and  (i\,,t\2)  is  the  channel 
output  at  a  given  time,  the  transition  probabilities  from  $2  ^2  ara  the 

same  as  those  of  a  binary  symmetric  channel  with  probability  of  error  €. 
If  $2s0,  one  has  with  probability  one;  if  then  one  has 
with  probability  one. 

In  order  to  decode  the  message  of  user  1,  it  is  sufficient  to  decode  that 
of  user  2.  So,  any  two-user  rate  (1  bit,  R2  bits)  for  which  R2  is  smaller 
than  the  cut-off  rate  of  a  binary  symmetric  channel  with  probability  of 
error  e,  namely  1  -2 log2{«/7+  •/(1-e)}  bits,  is  achievable  bu  JSD. 

If  user  1  transmits  at  a  rate  of  1  bit,  any  decoder  that  decodes  user  l's 
message  correctly  must  produce  (as  a  by-product)  a  correct  decoding  of 
user  2's  message,  whether  or  not  we  are  interested  in  that  message. 
Therefore,  no  two-user  rate  of  the  form  (1  bit,  R2  bits)  is  achievable  by 
NJSD  if  R2  is  positive.  More  precisely,  If  ^  bits,  R2  bits)  is  achievable 
by  NJSO,  then  R,  must  be  smaller  than  l-5(R2)  bits,  where  5  is  3 
function  such  that  S(R2)>0  for  R2>0. 
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Example  4.2.  A  channel  for  which  Rni  is  not  contained  in  R. 

Consider  a  channel  K=(P;X1,X2;Y1xY2)  (Figure  4.3)  where  X^X^lO.l}, 
Y^Y^tOjt.e},  and  the  transition  probabilities  are 

P(($i,£2)1($i,£2))  =  1-fi 

and  P((m) | s  6  for  each  Pair  °*  $2^2* 

The  output  symbol  (e,e)  is  called  an  erasure  and  e  is  called  the  erasure 
probability.  We  assume  that  6  satisfies  0<6<  i. 

(0,0)  o  (0,0) 

(0,1)  o^L! - o  (0,1) 

(1,0)  - 0  0*0) 

(1,1)0^^V  o  (1>t) 

(e,e) 

Figure  4.3.  The  two-user  channel  of  Example  4.2. 

An  outer  bound  to  RflO  is  found  by  observing  that,  if  (Rt,R2)  belongs  to 
R(K),  then  R,+R2  cannot  be  larger  than  R0(K4)=-ln{( i+3€)/4}  nats,  where 
K4  is  the  sinola-user  quaternary  erasure  channel,  and  RgCO  is  the 
cut-off  rate  of  K4. 

By  Theorem  4.1,  R  .(K)  is  inner-bounded  by  R,,,  „(K,Q)  for  anu  Q,  in 
particular  for  Q*=(Qj,Q2)  where  Q,  =  Q2=the  uniform  distribution  on 
10,1}.  By  simple  calculation,  R^ 0(K,Q*)={(R1,R2):0<R1<-ln[(l*€)/2I  nats, 
0<R2i-ln((l+f)/2]  nats}.* 

*  Actually,  Rnj  0(K,Q*)=Rni  0(K);  but  we  do  not  need  this  fact  here. 


Figure  4.4  shows  the  above  bounds.  We  notice  that  there  are  points  in 
the  neighborhood  of  Hn((t+€)/21,-ln{(l+€)/2I)  which  belong  to  Rftj(K)  but 
not  to  R(K),  since  -21nl(t+e)/2]>-ln{(1+3e)/4]  for  any  e,  0<e<1. 


Figure  4.4.  Inner  and  outer  bound  regions  of  Example  4.2. 


Complementary  Remarks  on  Example  4.2 

1)  Example  4.2  may  seem  paradoxical:  How  can  two  sequential  decoders, 
neither  with  a  complete  view  of  the  system,  achieve  a  point  that  is  not 
achievable  by  JSD?  This  can  be  explained  as  follows. 

Let  e1  be  the  code  for  user  1,  and  e2  be  the  one  for  user  2.  Let  e  be  the 
joint  tree  code  for  et  and  e2.  Let  k  be  the  number  of  channel  symbols  per 
branch. 

The  channel  output  here  is  a  sequence  of  pairs  of  symbols:  (t^  1,^21)* 
(T\i2>T\22)*  <^li3*^l23)*-—  shall  denote  the  sequence  T\11,n\l2,'q13,...  by 
y,.  The  first  kt  elements  of  yi  will  be  denoted  by  yj(..t).  t\21,ti22,7\23...* 
will  be  denoted  by  y2,  and  the  first  kt  elements  of  y2  by  y2(..t). 


A  node  ut(..t)  in  e,  is  said  to  be  consistent  if  e,u,(„t)  agrees  with  y,(..t) 
tn  the  unerased  digits.  A  node  u2(..t)  in  e2  is  said  to  be  consistent  if 
•2U2<~t)  agrees  with  y2(..t)  in  the  unerased  digits.  A  node  u1(..t)xu2(..t) 
in  •  is  said  to  be  consistent  if  ut(..t)  and  u2(..t)  are  consistent.  Let 
Wt(y,(..t)),  W2(y2(..t)),  and  W(y,xy2(..t))  denote  the  number  of  consistent 
level-t  nodes  in  et,  e2,  and  e,  respectively.  Note  the  identity  W=WtW2. 

Conditional  on  yi(..t),  ail  consistent  level-t  nodes  in  et  are  equally  likely 
to  be  correct.  Thus,  w,(yr(..t))/2  is  a  lower  bound  to  the  number  of 
level-t  nodes  in  et  that  reach  the  stack-top  of  SD,  in  NJSD.  (The 
reasoning  hers  is  the  same  as  that  leading  to  Lemma  3.1.1.)  On  the  other 
hand,  Wj  is  an  upper  bound  on  the  same  number  of  nodes  provided  that 
SOj  uses,  as  we  assume  that  it  does,  a  metric  that  assigns  -«>  to 
inconsistent  nodes,  thus  preventing  them  from  ever  reaching  the 
stack-top. 

Similarly,  the  number  of  levei-t  nodes  in  e2  that  reach  the  stack-top  of 
S02  is  lower-bounded  by  W2(y2(..t))/2  and  upper-bounded  by  W2(y2(„t)). 
And  the  number  of  level-t  nodes  in  e  that  reach  the  stack-top  in  JSD  is 
lower-bounded  by  W(y1*y2(..t))/2  and  upper-bounded  by  W(y,xy2(..t)). 

What  is  of  Interest  for  our  discussion  is  that  I)  W(y1xy2(..t))/2  is  a 
lower  bound  to  the  number  of  level-t  nodes  in  e  that  are  processed  in 
JSD,  and  2)  wt(y(..t))  +  w2(y(..t))  is  an  upper  bound  on  the  number  of 
level-t  nodes  in  and  e2  that  are  processed  in  NJSD.  Since  w*wtw2  and 
both  W,  and  W2  are  at  least  1,  we  have  W/2  i(Wt*W2)/4.  It  is  thus  clear 
that  the  complexity  of  JSD  is  greater  than  one  fourth  the  combined 
complexity  of  SO)  and  S02.  The  conclusion  that  follows  is  that  R(K)  must 
be  a  subset  of  ftJK). 

2)  Example  4.2  was  inspired  by  Massey's  paper  on  sequential  decoding  for 
single-user  M'ary  erasure  channels  [15].  Massey  observed  that,  if  MS21*, 
then  an  M'ary  erasure  channel  decomposes  into  L  completely  correlated 
binary  erasure  channels  (BEC),  as  Illustrated  in  Figure  4.5  for  L=2.  The 
component  BEC's  are  completely  correlated  in  the  sense  that  an  erasure 
in  one  means  an  erasure  in  all. 
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(A,a) 

(A,b) 

(B,a) 

(B,b) 


(A,a) 

(A,b) 

(B,a) 

(B,b) 

(E,e) 


Figure  4.5.  Decomposition  or  a  quaternary  erasure  channel. 


The  cut-off  rate  of  an  M'ary  erasure  channel  with  erasure  probability  € 
equals  Rg(M)=-ln{e+(1-€)/M]  nats.  If  one  uses  separate  sequential 
decoders  on  each  component  EEC  of  a  2L'arj  erasure  channel,  one  can 
then  achieve  rates  up  to  LR0(2)=-Llni(i+£)/2]  nats.  On  the  other  hand,  if 
sequential  decoding  is  used  directly  on  a  2*-'ary  erasure  channel,  then  the 
achievable  rates  are  upper-bounded  by  Rgv^1-).  But  LR0(2)>R0(2l)  for  any 
6,  0<e<l.  In  fact,  LR0(2)/R0(2l)  goes  to  infinity  as  l  increases. 

An  explanation  for  this  apparent  peculiarity  can  be  given  in  exactly  the 
same  way  as  has  been  done  for  Example  4.2.  The  conclusions  that  can  be 
drawn  from  Massey's  observation  are  that  i)  one  cannot  talk  about  a 
cut-off  rate  for  single-user  channels  without  being  explicit  about  the 
sequential  decoding  scheme  one  has  in  mind,  and  ii)  the  cut-off  rate  of 
ordinary  sequential  decoding  does  not  constitute  a  limit,  even  in  3n 
approximate  sense,  to  rates  at  which  reliable  communication  is  possible 
In  practice. 


m  * 


qy.y 


v.' V. 


Chapter  5 

SUGGESTIONS  FOR  FURTHER  RESEARCH 


1.  Determine  whether  R(K)*R0(K)  for  all  K. 

2 .  Determine  whether  R  is  convex.  Note  that,  if  R  is  Indeed  convex, 
proving  that  It  Is  convex  does  not  necessarily  require  an  explicit 
characterization  of  R. 

3.  Determine  whether  R0(IO=convex-hullR0(K,1)  for  all  K. 

4.  Determine  whether  strong  achlevability  (Def.  1.4.3)  is  equivalent  to 
achievability  (Def.  1.4.2). 

5.  The  metric  of  !2.2  requires  that,  in  order  to  maintain  achievability  as 

6.  the  distance  between  the  desired  rate  and  the  "outer’  boundary  of  R0, 
goes  to  zero,  the  number  of  channel  symbols  per  branch  increase  without 
bound.  Determine  whether  this  requirement,  which  does  not  exist  in  the 
single-user  case,  is  inherent  in  multi-user  sequential  decoding. 

A  result  in  this  regard,  which  is  not  reported  in  this  thesis,  is  that 
there  is  no  metric  that  1)  satisfies  the  sufficient  conditions  of  $2.1 
over  a  region  whose  closure  is  R0,  and  2)  does  not  require  the  number  of 
symbols  per  branch  go  to  infinity  as  S  goes  to  zero. 

6.  A  simulation  study  of  multi-user  sequential  decoding  may  be  done  to 
obtain  a  better  idea  about  its  complexity.  T'ne  analytical  upper  bounds  of 
this  thesis  are  useful  for  determining  whether  the  average  complexity  is 
finite;  but  they  are  too  weak  to  give  an  idea  about  the  actual  average 
complexity.  Furthermore,  a  simulation  study  would  provide  information 
about  the  dynamic  behavior  of  multi-user  sequential  decoders,  a  difficult 
subject  to  approach  analytically. 


7.  The  non-joint  sequential  decoding  sciieme  of  Chapter  4  is  just  one  of 
several  possible  approaches  to  sequential  decoding  with  multiple 
processors.  It  would  be  Interesting  to  see  what  could  be  gained  by 
letting  the  processors  exchange  Information  about  their  current 
estimates.  Such  schemes  are  not  likely  to  be  analytically  tractable;  but 
that  should  not  deter  one  from  exploring  these  potentially  more  powerful 
schemes. 
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